Focusing on Search Quality at Lucene/Solr Revolution 2015
I just got back from Lucene/Solr Revolution 2015 in Austin on a big high. There were a lot of exciting talks at the conference this year, but one thing that was particularly exciting to me was the focus that I saw on search quality (accuracy and relevance), on the problem of inferring user intent from the queries, and of tracking user behavior and using that to improve relevancy and so on. There were also plenty of great talks on technology issues this week that attack the other ‘Q’ problem – we keep pushing the envelope of what is possible with SolrCloud at scale and under load, are indexing data faster and faster with streaming technologies such as Spark and are deploying Solr to more and more interesting domains. Big data integrations with SolrCloud continue to be a hot topic – as they should since search is probably the most (only?) effective answer to dealing with the explosion of digital information. But without quality results, all the technology improvements in speed, scalability, reliability and the like will be of little real value. Quantity and quality are two sides of the same coin. Quantity is more of a technology or engineering problem (authors like myself that tend to “eschew brevity” being a possible exception) and quality is a language and user experience problem. Both are critical to success where “success” is defined by happy users. What was really cool to me was the different ways people are using to solve the same basic problem – what does the user want to find? And, how do we measure how well we are doing?
Our Lucidworks CTO Grant Ingersoll started the ball rolling in his opening keynote address by reminding us of the way that we typically test search applications by using a small set of what he called “pet peeve queries” that attack the quality problem in piecemeal fashion but don’t come near to solving it. We pat ourselves on the back when we go to production and are feeling pretty smug about it until real users start to interact with our system and the tweets and/or tech support calls start pouring in – and not with the sentiments we were expecting. We need better ways of developing and measuring search quality. Yes, the business unit is footing the bill and has certain standards (which tend to be their pet peeve queries as Grant pointed out) so we give them knobs and dials that they can twist to calm their nerves and to get them off our backs, but when the business rules become so pervasive that they start to take over from what the search engine is designed to do, we have another problem. To be clear, there are some situations where we know that the search engine is not going to get it right so we have to do a manual override. We can either go straight a destination (using a technique that we call “Landing Pages” ) or force what we know to be the best answer to the top – so called “Best Bets” which is implemented in Solr using the QueryElevationComponent. However, this is clearly a case where moderation is needed! We should use these tools to tweak our results – i.e. fix the intractable edge cases, not to fix the core problems.
This ad-hoc or subjective way of measuring search quality that Grant was talking about is pervasive. The reason is that quality – unlike quantity – is hard to measure. What do you mean by “best”? And we know from our own experience and from our armchair data science-esque cogitations on this, that what is best for one user may not be best for another and this can in fact change over time for a given user. So quality, relevance is “fuzzy”. But what can we do? We’re engineers not psychics dammit! Paul Nelson, the Chief Scientist at Search Technologies, then proceeded to show us what we can do to measure search quality (precision and recall) in an objective (i.e. scientific!) way. Paul gave a fascinating talk showing the types of graphs that you typically see in a nuts-and-bolts talk that tracked the gradual improvement in accuracy over time during the course of search application development. The magic behind all of this are query logs and predictive analytics. So given that you have this data (even if from your previous search engine app) and want to know if you are making “improvements” or not, Paul and his team at Search Technologies have developed a way to use this information to essentially regression test for search quality – pretty cool huh? Check out Paul’s talk if you didn’t get a chance to see it.
But look, lets face it, getting computers to understand language is a hard problem. But rather than throwing up our hands, in my humble opinion, we are really starting to dig into solving this one! The rubber is hitting the road folks. One of the more gnarly problems in this domain is name recognition. Chris Mack of Basis Technologies gave a very good presentation of how Basis is using their suite of language technologies to help solve this. Name matching is hard because there are many ambiguities and alternate ways of representing names and there are many people that share the same name, etc. etc. etc. Chris’s family name is an example of this problem – is it a truck, a cheeseburger (spelled Mac) or a last name? For those of you out there that are migrating from Fast ESP to Solr (a shoutout here to that company in Redmond Washington for sunsetting enterprise support for Fast ESP – especially on Linux – thanks for all of the sales leads guys! Much appreciated!) – you should know that Basis Technologies (and Search Technologies as well I believe) have a solution for Lemmatization that you can plug into Solr (a more comprehensive way to do stemming). I was actually over at the Basis Tech booth to see about getting a dev copy of their lemmatizer for myself so that we could demonstrate this to potential Fast ESP customers when I met Chris. Besides name recognition, Basis Tech has a lot of other cool things. Their flagship product is Rosette – a world class ontology / rules-based classification engine among other things. Check it out.
Next up on my list was Trey Grainger of CareerBuilder. Trey is leading a team there that is doing some truly outstanding work on user intent recognition and using that to craft more precise queries. When I first saw the title of Trey’s talk “Leveraging Lucene/Solr as a Knowledge Graph and Intent Engine”, I thought that he and his team had scooped me since my own title is very similar – great minds think alike I guess, (certainly true in Trey’s case, a little self-aggrandizement on my part here but hey, its my blog post so cut me some slack!). What they are basically doing is using classification approaches such as machine learning to build a Knowledge Graph in Solr and then using that at query time to determine what the user is asking for and then to craft a query that brings back those things and other closely related things. The “related to” thing is very important especially in the buzz-word salad that characterizes most of our resumes these days. The query rewrite that you can do if you get this right can slice through noise hits like a hot knife through butter.
Trey is also the co-author of Solr in Action with our own Tim Potter – I am already on record about this wonderful book – but it was cool what Trey did – he offered a free signed copy to the person who had the best tweet about his talk. Nifty idea – wish I had thought of it but, oh yeah, I’d have to write a book first – whoever won, don’t just put this book on your shelf when you get home – read it!
Not to be outdone, Simon Hughes of Dice.com, Trey’s competitor in the job search sector gave a very interesting talk about how they are using machine learning techniques such as Latent Semantic Analysis (LSA) and Google’s Word2Vec software to do similar things. They are using Lucene payloads in very interesting ways and building Lucene Similarity implementations to re-rank queries – heavy duty stuff that the nuts-and-bolts guys would appreciate too (the code that Simon talked about is open sourced). The title of the talk was “Implementing Conceptual Search in Solr using LSA and Word2Vec”. The keyword here is “implementing” – as I said earlier in this post, we are implementing this stuff now, not just talking about it as we have been doing for too long in my opinion. Simon also stressed the importance of phrase recognition and I was excited to realize that the techniques that Dice is using can feed into some of my own work, specifically to build autophrasing dictionaries that can then be ingested by the AutoPhraseTokenFilter. In the audience with me were Chris Morley of Wayfair.com and Koorosh Vakhshoori of Synopsys.com who have made some improvements to my autophrasing code that we hope to submit to Solr and github soon.
Tomoko Uchida of Rondhuit Co introduced us to a tool that she is working on called NLP4L – a natural language processing tool for Lucene. In the talk, she emphasized important things like precision and recall and how to use NLP techniques in the context of a Lucene search. It was a very good talk but I was standing too near the door– because getting a seat was hard – and some noisy people in the hallway were making it difficult to hear well. That’s a good problem to have as this talk like the others were very well attended. I’ll follow up with Tomoko because what she is doing is very important and I want to understand it better. Domo Arrigato!
Another fascinating talk was by Rama Yannam and Viju Kothuvatiparambil (“Viju”) of Bank of America. I had met Viju earlier in the week as he attended our Solr and Big Data course ably taught by my friend and colleague Scott Shearer. I had been tapped to be a Teaching Assistant for Scott. Cool, a TA, hadn’t done that since Grad School, made me feel younger … Anyway, Rama and Viju gave a really great talk on how they are using open-source natural language processing tools such as UIMA, Open NLP, Jena/SPARQL and others to solve the Q&A problem for users coming to the BofA web site. They are also building/using an Ontology (that’s where Jena and SPARQL come in) which as you may know is a subject near and dear to my heart, as well as NLP techniques like Parts Of Speech (POS) detection.
They have done some interesting customizations on Solr but unfortunately this is proprietary. They were also not allowed to publish this talk by having their slides shared online or the talk recorded. People were talking pictures of the slides with their cell phones (not me, I promise) but were asked not to upload them to Facebook, LinkedIn, Instagram or such. There was also a disclaimer bullet on one of their slides like you see on DVDs – the opinions expressed are the authors own and not necessarily shared by BofA – ta da ta dum – lawyereze drivel for we are not liable for ANYTHING these guys say but they’ll be sorry if they don’t stick to the approved script! So you will have to take my word for it, it was a great talk, but I have to be careful here – I may be on thin ice already with BofA legal and at the end of the day, Bank Of America already has all of my money! That said, I was grateful for this work because it will benefit me personally as a BofA customer even if I can’t see the source code. Their smart search knows the difference between when I need to “check my balance” vs when I need to “order checks”. As they would say in Boston – “Wicked Awesome”! One interesting side note here, Ramman and Viju mentioned that the POS tagger that they are using works really well for full sentences (on which the models were trained) but less well on sentence fragments (noun phrases) – still not too bad though – about 80%. More on this in a bit. But hey Banks – gotta love it – don’t get me started on ATM fees.
Last but not least (hopefully?) – as my boss Grant Ingersoll is fond of saying – was my own talk where I tried to stay competitive with all of this cool stuff. I had to be careful not to call it a Ted talk because this is a patented trademark and I didn’t want to get caught by the “Ted Police”. Notice that I didn’t use all caps to spell my own name here – they registered that so it probably would have been flagged by the Ted autobots. But enough about me. First I introduced my own pet peeve – why we should think of precision and recall before we worry about relevance tuning because technically speaking that is exactly what the Lucene engine does. If we don’t get precision and recall right we have created a garbage in – garbage out problem for the ranking engine. I then talked about autophrasing a bit, bringing out my New York – Big Apple demo yet again. I admitted that this is a toy problem but it does show that you can absolutely nail the phrase recognition and synonym problem which brings precision and recall to 100%. Although this is not a real world problem, I have gotten feedback that autophrasing is currently solving production problems, which is why Chris and Koorosh (mentioned above) needed to improve the code over my initial hack, for their respective dot-coms.
The focus of my talk then shifted to the work I have been doing on Query Autofiltering where you get the noun phrases from the Lucene index itself courtesy of the Field Cache (and yes Hoss, uh Chump, it works great, is less filling than some other NLP techniques – and there is a JIRA: SOLR-7539, take a look). This is more useful in a structured data situation where you have string fields with noun phrases in them. Autophrasing is appropriate for Solr text fields (i.e. tokenized / analyzed fields) so the techniques are entirely complementary. I’m not going to bore you with the details here since I have already written three blog posts on this but I will tell you that the improvements I have made recently will impell me to write a fourth installment – (hey, maybe I can get a movie deal like the guy who wrote The Martian which started out as a blog … naaaah, his was techy but mine is way too techy and it doesn’t have any NASA tie ins … )
Anyway, what I am doing now is adding verb/adjective resolution to the mix. The Query Autofiltering stuff is starting to resemble real NLP now so I am calling it NLP-Lite. “Pseudo NLP”, “Quasi-NLP” and “query time NLP” are also contenders. I tried to do a demo on this (which was partially successful) using a Music Ontology I am developing where I could get the questions “Who’s in The Who” and “Beatles songs covered by Joe Cocker” right, but Murphy was heavily on my case so I had to move on because the “time’s up” enforcers were looming and I had a plane to catch. I should say that the techniques that I was talking about do not replace classical NLP – rather we (collectively speaking) are using classic NLP to build knowledge bases that we can use on the query side with techniques such as query autofiltering. That’s very important and I have said this repeatedly – the more tools we have, the better chance we have of finding the right one for a given situation. POS tagging works well on full sentences and less well on sentence fragments, where the Query Autofilter excels. So its “front-end NLP” – you use classic NLP techniques to mine the data at index time and to build your knowledge base, and you use this type of technique to harvest the gold at query time. Again, the “knowledge base” as Trey’s talk and my own stressed can be the Solr/Lucene index itself!
Finally, I talked about some soon-to-be-published work I am doing on auto suggest. I was looking for a way to generate more precise typeahead queries that span multiple fields which the Query Autofilter could then process. I discovered a way to use Solr facets, especially pivot facets to generate multi-field phrases and regular facets to pull context so that I could build a dedicated suggester collection derived from a content collection. (whew!!) The pivot facets allow me to turn a pattern like “genre,musician_type” into “Jazz Drummers”, “Hard Rock Guitarists”, “Classical Pianists”, “Country Singers” and so on. The facets enable me to then grab related information to the subject so if I do a pivot pattern like “name,composition_type” to generate suggestions like “Bob Dylan Songs”, I can pull back other related things to Bob Dylan such as “The Band” and “Folk Rock” that I can then use to create user context for the suggester. Now, if you are searching for Bob Dylan songs, the suggester can start to boost them so that song titles that would normally be down the list will come to the top.
This matches a spooky thing that Google was doing while I was building the music ontology – after awhile, it would start to suggest long song titles with just two words entered if my “agenda” for that moment was consistent. So if I am searching for Beatles songs for example, after a few searches, typing “ba” brings back (in the typeahead) “Baby’s In Black and “Baby I’m a Rich Man” above the myriad of songs that start with Baby as well as everything else in their typeahead dictionary starting with “ba”. WOW – that’s cool – and we should be able to do that too! (i.e., be more “Google-esque” as one of my clients put it in their Business Requirements Document) I call it “On-The-Fly Predictive Analytics” – as we say in the search quality biz – its ALL about context!
I say “last but not least” above, because for me, that was the last session that I attended due to my impending flight reservation. There were a few talks that I missed for various other reasons (there was a scheduling conflict, my company made me do some pre-sales work, I was wool gathering or schmoozing/networking, etc) where the authors seem to be on the same quest for search quality. Talks like “Nice Docs Finish First” by Fiona Condon at Etsy, “Where Search Meets Machine Learning” by folks at Verizon, “When You Have To Be Relevant” by Tom Burgmans of Wolters-Kluwer and “Learning to Rank” by those awesome Solr guys at Bloomberg – who have got both ‘Qs’ working big time!
Since I wasn’t able to attend these talks and don’t want to write about them from a position of ignorance, I invite the authors (or someone who feels inspired to talk about it) to add comments to this post so we can get a post-meeting discussion going here. Also, any author that I did mention who feels that I botched my reporting of their work should feel free to correct me. And finally, anybody who submitted on the “Tweet about Trey’s Talk and Win an Autographed Book” contest is encouraged to re-tweet – uh post, your gems here.
So, thanks for all the great work on this very important search topic. Maybe next year we can get Watson to give a talk so we can see what the computers think about all of this. After all, Watson has read all of Bob Dylan’s song lyrics so he (she?) must be a pretty cool dude/gal by now. I wonder what it thinks about “Stuck Inside of Mobile with the Memphis Blues Again”? To paraphrase the song, yes Mama, this is really the end. So, until we meet again at next year’s Revolution, Happy searching!
Best of the Month. Straight to Your Inbox!
Dive into the best content with our monthly Roundup Newsletter!
Each month, we handpick the top stories, insights, and updates to keep you in the know.