Transitioning from Databases to Search Engines

Lucid recently did a customer survey about Solr and Lucene usage. One of the comments that caught my eye was somebody asking about the differences between traditional SQL database engines and search engines. If you’ve worked 20 years on databases and suddenly find yourself put onto a search project, some of the concepts will seem very familiar, whereas other implementation patterns may seem quite different.

The technical similarities and differences between database and search are so critical I wrote a full article a few years ago: Contrasting Relational Databases and Full-Text Search Engines, and there’s also a summary here.

One of my motives back then was to tell the world “the LIKE operator is NOT a substitute for an actual fulltext search engine!” (although you can mimic that behavior in Solr if you really need it, for compatibility with legacy systems)

One of the main points of the article, which I won’t repeat here, is to compare and contrast the vocabulary between the two spaces, like “record” vs. “document”; scroll down to the section “Vocabulary Comparison Summary”

One update that I would make if I were writing this article today is to update the info on JOINS. For many years you simply didn’t do “joins” at search time with search engines. Obviously you could index content from databases, and include data from multiple tables, but those table-joins were typically done at index time, not search time, so in the article it says “n/a”. You might also call this denormalizing or similar to a materialized view. But times have changed and Solr 4.x does now support limited joins, as do some other open source search engines. Look for the performance to improve over time.

Another change since I wrote that was about transactions. While there’s still not true database level transactions, Solr 4 does support optimistic locking and real-time retrieval of updated values.

And another trend we’ve seen that, once again that isn’t a direct replacement for SQL but do the trick, is the use of search engine Facets and statistics to do the work of SQL’s “GROUP BY” and SUM/MIN/MAX type of functions. If you’ve dismissed search engines in the past because of this requirement, it’s time to take another look!

Another industry shift is towards non-traditional non-SQL based data storage, things like NoSQL, Mongo, HBase, etc. If you’re a classic SQL developer, you might find yourself learning something else anyway; the good news is these new-age databases also have good connections to search engines.