Solr Powered ISFDB – Part #9: Autocomplete
This is part #9 in a (never ending?) series of articles on Indexing and Searching the ISFDB.org data using Solr.
When we left last time, I had upgraded the version of Solr I was using from 1.4.1, to the newly released 3.1. Today I wanted to make some improvements to the functionality of my Velocity UI, by adding in Autocomplete support for things the user types into the search box.
(If you are interested in following along at home, you can checkout the code from github. I’m starting at the blog_8 tag, and as the article progresses I’ll link to specific commits where I changed things, leading up to the blog_9 tag containing the end result of this article.)
Getting Started: Borrowing Code
One of the nice additions to the example Velocity templates in Solr 3.1, is the usage of the jQuery Autocomplete Plugin. So the first step I’m going to take in adding this functionality to my own templates (which, as you may recall, we’re copied from 3.1 in the first place) is to look at how the functionality is hooked in there, and reuse the same ideas.
As little as i understand about Velocity templates or javascript, I do know how to use “grep” and it looks like the crux of the functionality seems to come from two main pieces…
head.vm
includes the jQuery autocomplete files, and then registers an “autocomplete” callback function with jQuery that seems to be hitting the “/terms” URL using the “suggest” template- suggest.vm is a simple template that looks like it just outputs a plain text list of the terms
Since my Velocity is rustier then my javascript, that last item is the most confusing to me — but skimming the jQuery autocomplete() docs that does in fact seem to be the format expected, so I’ll roll with it. All in all this seems like it will be fairly straightforward.
In fact, I apparently never removed the autocomplete hooks in head.vm and suggest.vm back when i first “borrowed” the 3.1 templates — so really the question isn’t how to make it work, but why isn’t it already working? The answer seems to be the “/terms” path. Even though I reused most of the velocity templates, I created a much simpler solrconfig.xml file for myself, So I need to add that request handler in using the example configs as my template, and tweak the terms.fl in my head.vm to better match my schema.
So Why Isn’t It Working?
After making these changes, I can now see “successful” requests being made to the “/terms” component in my Solr logs when I start typing in my search box…
INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302294601828&terms.fl=catchall&q=a&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=A} status=0 QTime=0 Apr 8, 2011 1:30:02 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302294602831&terms.fl=catchall&q=as&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=As} status=0 QTime=1 Apr 8, 2011 1:30:03 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302294603672&terms.fl=catchall&q=asi&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asi} status=0 QTime=1 Apr 8, 2011 1:30:04 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302294604604&terms.fl=catchall&q=asim&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asim} status=0 QTime=1 Apr 8, 2011 1:30:05 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302294605490&terms.fl=catchall&q=asimo&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asimo} status=0 QTime=0 Apr 8, 2011 1:30:06 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302294606678&terms.fl=catchall&q=asimov&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asimov} status=0 QTime=0
…and yet in spite of this, I’m not getting any autocomplete suggestions. So what’s going wrong? More importantly, how can i tell what’s going wrong?
I’m going to start with the assumption that every piece of the system is doing it’s job properly according to how they are configured, and that I screwed something up in the setup/configuration. (I find that in life in general, when something goes wrong, it’s a good idea to assume it’s my fault until i can prove otherwise). So to start with, let’s see what some of these “/terms” requests are producing. When i load http://localhost:8983/solr/terms?limit=10×tamp=1302294432177&terms.fl=catchall&q=asi&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=Asi
in my browser, the first and most obvious thing that jumps out at me is that it’s totally blank. This re-affirms the belief that jQuery isn’t broken (how can it give suggestions if there’s no data) which means we’ve already narrowed the problem space down considerably.
The next steps are to eliminate some more pieces of the puzzle and/or gather more data. By eliminating “suggest.vm” from the equation, i should be able to do both: http://localhost:8983/solr/terms?limit=10×tamp=1302294432177&terms.fl=catchall&q=asi&terms.sort=count&terms.prefix=Asi
gives me back the following response…
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <lst name="terms"> <lst name="catchall"/> </lst> </response>
So far so good, I’ve now (mostly) ruled out Velocity (and my suggest.vm template) as cause of the problem, but I’ve now also noticed something about my request that I didn’t notice before: ...&q=asi&...&terms.prefix=Asi
. jQuery is sending a lowercase version of my input in the “q” param (that appears to be it’s default behavior) but it’s sending the original case as the “terms.prefix” param (thinking back to my head.vm changes — that’s something explicitly being requested as part of the “extraParams”. In my schema.xml, “catchall” uses the LowerCaseFilterFactory
which means there are no indexed terms in that field that contain uppercase characters.
There may be a way to ask jQuery to pass the same lowercase value it uses in the “q” param by default to the “terms.prefix” param, but since it wasn’t immediately obvious to me, I went with something i was a little more confident of and just did it myself using javascript.
So Why Is It Still Not Working?
Now when I enter “Asi” in the search box, I see the lowercase values showing up in my logs…
INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302296185396&terms.fl=catchall&q=a&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=a} status=0 QTime=115 Apr 8, 2011 1:56:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302296187320&terms.fl=catchall&q=as&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=as} status=0 QTime=3 Apr 8, 2011 1:56:27 PM org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/terms params={limit=10×tamp=1302296187960&terms.fl=catchall&q=asi&wt=velocity&terms.sort=count&v.template=suggest&terms.prefix=asi} status=0 QTime=2
…but there are still no suggestions. Tracing the same steps I used before i see that with the “suggest.vm” template I’m still getting a blank response, but when I just look at the raw XML output I see…
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">0</int> <int name="QTime">0</int> </lst> <lst name="terms"> <lst name="catchall"> <int name="asimov">2820</int> <int name="asimov's">2256</int> <int name="asire">32</int> <int name="aside">16</int> <int name="asia">15</int> <int name="asimovs">9</int> <int name="asian">6</int> <int name="asiatic">5</int> <int name="asim">3</int> <int name="asis">2</int> </lst> </lst> </response>
So let’s take another look at that suggest.vm velocity template. It didn’t really occur to me before, but it’s referring to $response.response.terms.name
— I thought that “name” was generic, but now I’m guessing it was actually in reference to the fact that the example templates were using the “name” field for autocomplete. So I need to change that to “catchall” and now my jQuery autocomplete is working as expected.
Great! Now What?
I’ve now got functional autocomplete working in my index, just like the example Solr velocity templates, but there are some things I don’t like about this setup that i want to fix…
- I need to use a better field for picking suggestions. “catchall” was a choice I made on a whim because I knew it would contain both words from titles as well as words from author names, and I wanted both to work in autocomplete — but the “catchall” field also contains a lot of other crap that probably won’t be useful (ie: right now if you type “h” it suggests “http” because URLs are copied into “catchall”)
- The configs for autocomplete are spread around in too many files — head.vm, suggest.vm, and solrconfig.xml all needed changed to make this work, and will likely all need changed to switch the field as well. It would be nice to consolidate and simplify this.
TermsComponent
is a nice simple way to get autocomplete suggestions based on prefix matching — but new in Solr 3.1 is theSuggester
plugin which is the New Hotness for how to do autocomplete (and spelling suggestions)
So I’m going to set out to make some improvements to what I’ve got, starting with the switch to using “Suggester”, but with an eye to the other two problems as I go along.
Suggester
The Suggester class is really just a new type of “dictionary” implementation for the SpellCheckComponent that has some nice properties (so I’m told) for generating autocomplete suggestions. Based on the wiki, I added a new “/suggest” request handler to my configs, that uses the SpellCheckComponent with a Suggester based on my catchall field. The one change I made to the example was to specify a threshold of “0.0”, meaning that (for now) I want all terms in my field to be used in the dictionary.
Unlike the TermsComponent, which scans the terms in the main index for terms, the SpellChecker uses it’s own data structures that must be explicitly built up from the source data in the index. The configuration I used includes an instruction to “buildOnCommit” so anytime I update the index it will also be updated, however according to the docs, the “Lookup” implementations used by Suggester don’t persist any data to disk, so when you first startup the server there won’t be any suggestions by default. So I also added a “firstSearcher” listener to ensure the Suggester dictionary would be built in this case.
So with that, my “/suggest” handler is always up and running and ready to go, but it’s not currently giving back very useful results — most likely because of my field choice.
Better Input, Better Output
The main things I want to autocomplete on are author names and titles, So the first step is to create a new field for that purpose using copyField. Doing that gets me some nice looking results for input like isaac asim
…
<lst name="spellcheck"> <lst name="suggestions"> <lst name="isaac"> <int name="numFound">5</int> <int name="startOffset">0</int> <int name="endOffset">5</int> <arr name="suggestion"> <str>isaac</str> <str>isaacs</str> <str>isaac's</str> <str>isaacson</str> <str>isaacman</str> </arr> </lst> <lst name="asim"> <int name="numFound">4</int> <int name="startOffset">6</int> <int name="endOffset">10</int> <arr name="suggestion"> <str>asimov's</str> <str>asimov</str> <str>asimovs</str> <str>asim</str> </arr> </lst> <str name="collation">isaac asimov's</str> </lst> </lst>
This isn’t in the same format as the TermsComponent, but since we’re going to use a velocity template to reformat it, that won’t hurt anything. The real juicy looking bit is the “collation” value, where the SpellChecker suggests combinations of individual suggestions. By default it only gives you one, but we can increase that to get a nice list of multi-word suggestions in the “collation” section.
Use Our New Suggestions And Clean Up The Configs
To use our new suggestions, we need to get them in the format jQuery expects. At first I wasn’t sure how to make the suggest.vm velocity template return the all the values for the “collation” key in the “suggestions” NamedList, but it only took a little experimentation (and knowledge of the NamedList API) to to get it working. A nice side effect of the change to the Suggester based approach is that it’s no longer necessary to know the field being used for suggestions in the velocity template — so my goal of cleaner configs is already making progress.
Another nice perk of switching to the Suggester is that it uses the “q” param for it’s input, so the “terms.*” params can be removed from our jQuery autocomplete call — but we can also move the “wt” and “v.template” params out of our jQuery call and into our “/suggest” defaults. Giving us a nice clean separation between how we configure our suggestions, and where we use them.
Last, but not least: the default behavior of the jQuery autocomplete is to submit the first suggestion from the list if the user hits “return”, even if the user didn’t select it. I consider that asinine; if the user wants to search for a word that isn’t the first one in the suggestion list, they should be allowed to. fortunately, I noticed in the docs an easy way to change that.
Conclusion (For Now)
And that wraps up this latest installment with the blog_9 tag. The Search UI for our ISFDB Solr Index now has a nicely functioning javascript autocomplete feature, that didn’t really require learning anything about javascript. In my next post, I plan to continue talking about how to improve the user experience — but I’ll switch tacks a bit to talk more about tuning the ranking of results then about the UI itself.
Best of the Month. Straight to Your Inbox!
Dive into the best content with our monthly Roundup Newsletter!
Each month, we handpick the top stories, insights, and updates to keep you in the know.