The Solr suggester search component was previously discussed on this blog in the post Solr Suggester by Solr committer Erick Erickson. This post shows how to add a Solr suggester component to a Fusion query pipeline in order to provide the kind of auto-complete functionality expected from a modern search app.
By auto-complete we mean the familiar set of drop-downs under a search box which suggest likely words or phrases as you type. This is easy to do using Solr’s FST-based suggesters. FST stands for “Finite-State Transducer”. The underlying mechanics of an FST allow for near-matches on the input, which means that auto-suggest will work even when the inputs contain typos or misspellings. Solr’s suggesters return the entire field for a match, making it possible to suggest whole titles or phrases based on just the first few letters.
The data in this example is derived from data collected by the Movie Tweetings project between 2013 and 2016. A subset of that data has been processed into a CSV file consisting of a row per film, with columns for a unique id, the title, release year, number of tweets found, and average rating across tweets:
id,title,year,ct,rating ... 0076759,Star Wars: Episode IV - A New Hope,1977,252,8.61111111111111 0080684,Star Wars: Episode V - The Empire Strikes Back,1980,197,8.82233502538071 0086190,Star Wars: Episode VI - Return of the Jedi,1983,178,8.404494382022472 1185834,Star Wars: The Clone Wars,2008,11,6.090909090909091 2488496,Star Wars: The Force Awakens,2015,1281,8.555815768930524 ...
After loading this data into Fusion, I have a collection named “movies”. The following screenshot shows the result of a search on the term “Star Wars”.
The search results panel shows the results for the search query “Star Wars”, sorted by relevancy (i.e. best-match). Although all of the movie titles contain the words “Star Wars”, they don’t all begin with it. If you’re trying to add auto-complete to a search box, the results should complete the initial query. In the above example, the second best-match isn’t a match at all in an auto-complete scenario. Instead of using the default Solr “select” handler to do the search, we can plug in an FST suggester, which will give us not just auto-complete, but fuzzy autocomplete, through the magic of FSTs.
Fusion collections are Solr collections which are managed by Fusion. To add a Lucene/Solr suggester to the “movies” collection requires editing the Solr config files according to the procedure outlined in the “Solr Suggester” blogpost:
- define a field with the correct analyzer in file
- define a request handler for auto-complete in file
Fusion sends search requests to Solr via the Fusion query pipeline Solr query stage, therefore it’s also necessary to configure a Solr query stage to access the newly configured suggest request handler.
The Fusion UI provides tools for editing Solr configuration files. These are available from the “Configuration” section on the collection “Home” panel, seen on the left-hand side column in the above screenshot. Clicking on the “Solr Config” option shows the set of available configuration files for collection “movies”:
Clicking on file
schema.xml opens an edit window. I need to define a field type and specify how the contents of this field will be analyzed when creating the FSTs used by the suggester component. To do this, I copy in the field definition from the very end of the “Solr Suggester” blogpost:
<!-- text field for suggestions, taken from: https://lucidworks.com/blog/2015/03/04/solr-suggester/ --> <fieldType name="suggestTypeLc" class="solr.TextField" positionIncrementGap="100"> <analyzer> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="[^a-zA-Z0-9]" replacement=" " /> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType>
After clicking the “Save” button, the Fusion UI displays the notification message: “File contents saved and collection reloaded.”
Next I edit the
solrConfig.xml file to add in definition for the suggester search component and corresponding request handler:
This configuration is based on Solr’s “techproducts” example, based on the Suggester configuration docs in the Solr Reference Guide. The suggest search component is configured with parameters for the name, and implementation type of the suggester, the field to be analyzed, the analyzer used. We also specify the optional parameter
weightField which, if present, returns an additional document field that can be used for sorting.
For this example, the
field parameter is
suggestAnalyzerFieldType specifies that the movie title text will be analyzed using the analyzer defined for field type
suggestTypeLc, (added to the
schema.xml file for the “movies” collection in the previous step). Each movie has two kinds of ratings information: average rating and count (total number of ratings from tweets). Here, the average rating value is specified:
<searchComponent name="suggest" class="solr.SuggestComponent"> <lst name="suggester"> <str name="name">mySuggester</str> <str name="lookupImpl">FuzzyLookupFactory</str> <str name="dictionaryImpl">DocumentDictionaryFactory</str> <str name="storeDir">suggester_fuzzy_dir</str> <str name="field">movie_title_txt</str> <str name="weightField">rating_tf</str> <str name="suggestAnalyzerFieldType">suggestTypeLc</str> </lst> </searchComponent>
For details, see Solr wiki Suggester seachComponent section.
The request handler configuration specifies the request path and the search component:
<requestHandler name="/suggest" class="solr.SearchHandler"> <lst name="defaults"> <str name="suggest">true</str> <str name="suggest.count">10</str> <str name="suggest.dictionary">mySuggester</str> </lst> <arr name="components"> <str>suggest</str> </arr> </requestHandler>
For details, see Solr wiki Suggester requestHandler section.
After each file edit, the collection configs are saved and the collection is reloaded so that changes take effect immediately.
Finally, I configure a pipeline with a Solr query stage which permits access to the suggest request handler:
Lacking a UI with the proper JS magic to show autocomplete in action, we’ll just send a request to the endpoint, to see how the suggest request handler differs from the default select request handler. Since I’m already logged into the Fusion UI, from the browser location bar, I request the URL:
The power of the FST suggester lies in its robustness. Misspelled and/or incomplete queries still produce good results. This search also returns the same results as the above search:
Under the hood, Lucidworks Fusion is Solr-powered, and under the Solr hood, Solr is Lucene-powered. That’s a lot of power. The autocompletion for “Solr-fu” is “Solr-Fusion”!