Weird title, I know, but they are my pet names (there probably are better terms for them in use elsewhere) for two techniques I find often help people solve search problems in the real world, but don’t necessarily seem like good things to do at first glance.
If you’ve ever attended on of my trainings, you know I’m fond of saying “Just because a user types words into a search box, doesn’t mean you have to execute a search”. At first, such a saying seems counterintuitive, but in reality, it works at several levels:
- Many, many queries are repeats of previously executed queries where the data has not changed, so just return cached results. Solr is HTTP Cache friendly, so use it to your advantage. Additionally, properly tune your Solr caches and your JVM so that the O/S can cache things as well. Also know when a cache is not effective and thereby avoid needlessly updating a cache that is never hit.
- Many times and for many reasons, you may already know what some answers are, independent of any particular user. For instance, if someone types “benefits” into a search box on your companies HR page, it likely makes sense to make sure that the first result is the main HR Benefits page, regardless if that actually has the best score due to the system scoring. Use something like the (poorly named, but effective) Query Elevation Component in Solr to setup a mapping between a query and a set of documents which match that query. This can be used for editorial reasons, ad capabilities, etc.
- In certain cases, it may make sense just to go straight to the page. Wikipedia often does this (try searching for Lucene.) Technically, this may involve actually running a query and applying some heuristics to determine the one best result, but in other cases it may just mean mapping queries to one result editorially.
It is often necessary in many applications to execute more than one query for any given user query. For instance, in applications that require very high precision (only good results, forgoing marginal results), the app. may have several fields, one for exact matches, one for case-insensitve matches and yet another with stemming. Given a user query, the app may try the query against the exact match field first and if there is a result, return only that set. If there are no results, then the app would proceed to search the next field, and so on. Another example of Invisible Queries is pseudo-relevance feedback, whereby the top X results are assumed to be good and are automatically used to create a new query that is submitted and its results are returned. Solr and Lucene’s More Like This is an example. Additionally, one could also do things like automatically submit spelling checking results in the cases where no results are returned for the original query.
Naturally, all of this brings up the performance question. How can this possibly perform? The answer is, it may not, so you need to test it. However, I’ve seen lots of applications where it does, especially when used in a short-circuiting manner (and not in an additive manner). Additionally, you need to keep an eye on your logs, etc. In the right applications, it may be the case that a lot of your queries are exact matches, in other cases your users may very well be willing to trade off a few 100 extra milliseconds (often less) in order to have better results.
Next time your in need of an speed boost or perhaps you are unclear on how to get exactly the results you need, I hope some fake and invisible queries will help you out!