Solr Dev Diary: Solr and Near Real-Time Search

Solr’s UpdateHandler has gotten a little crusty. Many of the implementation details are there due to old,  tired,  and removed requirements and functions. For those that do not know, documents that you add to Solr are actually put into the index by the UpdateHandler.

There are two details about the current UpdateHandler implementation that are particularly limiting.

First, Solr uses it’s own lock’s on top of Lucene, adding a courser, unnecessary layer of locking on top of the IndexWriter. These locks had a reason to exist once upon a time, but really, they no longer do. There is no reason to block additional document adds while performing a commit, but currently this is what Solr does. Removing these locks will reduce complexity and maintenance costs by allowing us to ‘mostly’ just use Lucene’s locking.  Solr will also more easily simply inherit improvements from Lucene in this area.

Second, because of historical requirements, Solr will close and open a new IndexWriter on every commit. This means that every commit waits for all background Index merging threads to finish merging. This can be a non insignificant amount of time – and during this time you cannot add any documents to the index. You also cannot see the documents that have just been added to the index until the merges and commit are complete. Really, the UpdateHandler should simply commit and open a new SolrIndexSearcher – with the background threads happily merging *in the background*.

There are a few other things that bug me as well.

Well I’m going to fix them all now. Time to remove the crust and introduce Lucene near-real-time support to Solr. You should be able to open a new view on recently added content with Solr in a fraction of the time possible right now. It’s not right that you have to juggle SolrCore’s to attempt near real time index updates – it’s time to make things easier. Time to makes things faster.

And when Lucene finishes it’s real-time support and stops IndexWriter flushes from blocking document additions, Solr will be even more ready to take advantage where it can. There will still be more to do – not everything Solr does is yet per segment, and replication is not currently very near-real-time friendly – but we will keeping moving things in the right direction.

I’m tackling these changes here: https://issues.apache.org/jira/browse/SOLR-2193

– Mark

Share the knowledge

You Might Also Like

Lucidworks Named a Leader: What This Means for Search, AI—and Your Business

Lucidworks’ recognition as a Leader signals that enterprise search and AI now...

Read More

MCP vs. ACP: What’s the Difference, and When Should Each Be Used?

Artificial intelligence is changing how people interact with data, products, and content....

Read More

The Future of Digital Commerce with ACP: From Static Catalogs to Agent Negotiations

For decades, digital commerce has been built around a familiar concept: the...

Read More

Quick Links