Solr Dev Diary: Solr and Near Real-Time Search

Solr’s UpdateHandler has gotten a little crusty. Many of the implementation details are there due to old,  tired,  and removed requirements and functions. For those that do not know, documents that you add to Solr are actually put into the index by the UpdateHandler.

There are two details about the current UpdateHandler implementation that are particularly limiting.

First, Solr uses it’s own lock’s on top of Lucene, adding a courser, unnecessary layer of locking on top of the IndexWriter. These locks had a reason to exist once upon a time, but really, they no longer do. There is no reason to block additional document adds while performing a commit, but currently this is what Solr does. Removing these locks will reduce complexity and maintenance costs by allowing us to ‘mostly’ just use Lucene’s locking.  Solr will also more easily simply inherit improvements from Lucene in this area.

Second, because of historical requirements, Solr will close and open a new IndexWriter on every commit. This means that every commit waits for all background Index merging threads to finish merging. This can be a non insignificant amount of time – and during this time you cannot add any documents to the index. You also cannot see the documents that have just been added to the index until the merges and commit are complete. Really, the UpdateHandler should simply commit and open a new SolrIndexSearcher – with the background threads happily merging *in the background*.

There are a few other things that bug me as well.

Well I’m going to fix them all now. Time to remove the crust and introduce Lucene near-real-time support to Solr. You should be able to open a new view on recently added content with Solr in a fraction of the time possible right now. It’s not right that you have to juggle SolrCore’s to attempt near real time index updates – it’s time to make things easier. Time to makes things faster.

And when Lucene finishes it’s real-time support and stops IndexWriter flushes from blocking document additions, Solr will be even more ready to take advantage where it can. There will still be more to do – not everything Solr does is yet per segment, and replication is not currently very near-real-time friendly – but we will keeping moving things in the right direction.

I’m tackling these changes here: https://issues.apache.org/jira/browse/SOLR-2193

– Mark

You Might Also Like

How an electronics giant meets engineers where they are, with 44 million products in catalog

Meet Mohammad Mahboob: A search platform director navigating 44 million products across...

Read More

From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

Watch this on-demand webinar to discover the six smartest AI-driven DX strategies...

Read More

Build custom AI agents without writing a single line of code? Yep, we did that.

Finally, a low-code AI platform (really, no code) that lets the people...

Read More

Quick Links