Most companies know the value of a smooth user experience on their website. But what about for their onsite search? Simply shoving Ye Olde Search Box in the upper right corner doesn’t cut it anymore. And having bad search could mean bad news for your online presence:
- 79% of people who don’t like what they find will jump ship and search for another site (Google).
- 15% of brands dedicate resources to optimize their site search experience (Econsultancy).
- 30% of visitors want to use a website’s search function – and when they do, they are twice as likely to convert (Moz).
This expands even further to the search applications inside an organization like enterprise search, research portals, and knowledge management systems. Many teams focus a lot of resources on getting the user experience right: the user interactions and the the color palette. But what about the quality of the search results themselves?
Automate Iterations With Machine Learning
Smart search teams iterate their algorithms so relevancy and ranking is continuously refined and improved. But what if you could automate this process with machine learning? There are many methods and techniques that developers turn to as they continuously pursue the best relevance and ranking.
There are several approaches and methodologies to refining this art. One popular approach is called Learning-to-Rank or LTR.
LTR is a powerful machine learning technique that uses supervised machine learning to train the model to find “relative order.” “Supervised” in this case means having humans manually tune the results for each query in the training data set and using that data sample to teach the system to reorder a new set of results.
Popular search engines have started bringing this functionality into their feature sets so developers can put this powerful algorithm to work on their search and discovery application deployments.
With this year’s Activate debuting an increased focus on search and AI and related machine learning technologies, there are two sessions focused specifically on using LTR with Apache Solr deployments. To help you get the most out of these two sessions, we’ve put together a primer on LTR so you and your colleagues show up in Montreal ready to learn.
But first some background.
How LTR Differs From Other ML Techniques
Traditional ML solutions are focused on predicting or finding a specific instance or event and coming up with a binary yes/no flag for making decisions or a numeric score. Think of use cases like fraud detection, email spam filtering, or anomaly identification. It’s either flagged or it’s not.
LTR goes beyond just focusing on one item to examining and ranking a set of items for optimal relevance. With LTR there is scoring involved for the items in the result set, but the final ordering and ranking is more important than the actual numerical scoring of individual items.
How LTR Knows How to Rank Things
The LTR approach requires a model or example of how items should be ideally ranked. This is often a set of results that have been manually curated by subject matter experts (again, supervised learning). This relies on well-labeled training data, and of course, human experts.
The ideal set of ranked data is called “ground truth” and becomes the data set that the system “trains” on to learn how best to rank automatically. This method is ideal for precise academic or scientific data.
A second way to create an ideal set of training data is to aggregate user behavior like likes, clicks, and view or other signals. This is a far more scalable and efficient approach.
LTR With Apache Solr
With version 6.4, Apache Solr introduced LTR as part of its libraries and API-level building blocks. But, the reference documentation might only make sense to a seasoned search engineer.
Solr’s LTR component does not actually do the training on any models — it is left to your team to build a model training pipeline from scratch. Plus, figuring out how all these bits and pieces come together to form an end-to-end LTR solution isn’t straightforward if you haven’t done it before.
So let’s turn to the experts.
Live Case Study: Bloomberg
Financial information services giant Bloomberg runs one of the largest Solr deployments on the planet and is always looking for ways to increase and optimize relevancy while maintaining split-second query response times to millions of financial professionals and investors.
In their quest to continuously improve result ranking and the user experience, Bloomberg turned to LTR and literally developed, built, tested, and committed the LTR component that sits inside the Solr codebase.
Those engineers from Bloomberg were onstage at the Activate conference in Montreal in October 2018 to talk about LTR. They discussed their architecture and challenges in scaling and how they developed a plugin that made Apache Solr the first open source search engine that can perform LTR operations out of the box.
The team told the full war story of how Bloomberg’s real-time, low-latency news search engine was trained on LTR and how your team can do it, too – along with the many ways not to do it. Here’s the video:
Live Demo: Practical End-to-End Learning to Rank Using Fusion
Also at Activate 2018, Lucidworks Senior Data Engineer Andy Liu presented a three-part demonstration on how to set up, configure, and train a simple LTR model using both Fusion and Solr.
Liu demonstrated how to include more complex features and show improvement in model accuracy in an iterative workflow that is typical in data science. Particular emphasis was given to best practices around utilizing time-sensitive user-generated signals.
The session explored some of the tradeoffs between engineering and data science, as well as Solr querying/indexing strategies (sidecar indexes, payloads) to effectively deploy a model that is both production-grade and accurate. Here’s the video:
So that’s a brief overview of LTR in the abstract and then where to see it action with a real world case study and a practical demo of implementing it yourself. Here’s even more reading to make sure you show up in Montreal ready to get the most out these sessions:
More LTR Resources
Bloomberg’s behind the scenes look at how they developed the LTR plugin and brought it into the Apache Solr codebase
An intuitive explanation of Learning to Rank by Google Engineer Nikhil Dandekar that details several popular LTR approaches including RankNet, LambdaRank, and LambdaMART
Pointwise vs. Pairwise vs. Listwise Learning to Rank also by Dandekar
A real-world example of Learning to Rank for Flight Itinerary by Skyscanner app engineer Neil Lathia
Learning to Rank 101 by Pere Urbon-Bayes, another intro/overview of LTR including how to implement the approach in Elasticsearch