Solr Dev Diary: Solr and Near Real-Time Search
Solr’s UpdateHandler has gotten a little crusty. Many of the implementation details are there due to old, tired, and removed requirements and functions. For those that do not know, documents that you add to Solr are actually put into the index by the UpdateHandler.
There are two details about the current UpdateHandler implementation that are particularly limiting.
First, Solr uses it’s own lock’s on top of Lucene, adding a courser, unnecessary layer of locking on top of the IndexWriter. These locks had a reason to exist once upon a time, but really, they no longer do. There is no reason to block additional document adds while performing a commit, but currently this is what Solr does. Removing these locks will reduce complexity and maintenance costs by allowing us to ‘mostly’ just use Lucene’s locking. Solr will also more easily simply inherit improvements from Lucene in this area.
Second, because of historical requirements, Solr will close and open a new IndexWriter on every commit. This means that every commit waits for all background Index merging threads to finish merging. This can be a non insignificant amount of time – and during this time you cannot add any documents to the index. You also cannot see the documents that have just been added to the index until the merges and commit are complete. Really, the UpdateHandler should simply commit and open a new SolrIndexSearcher – with the background threads happily merging *in the background*.
There are a few other things that bug me as well.
Well I’m going to fix them all now. Time to remove the crust and introduce Lucene near-real-time support to Solr. You should be able to open a new view on recently added content with Solr in a fraction of the time possible right now. It’s not right that you have to juggle SolrCore’s to attempt near real time index updates – it’s time to make things easier. Time to makes things faster.
And when Lucene finishes it’s real-time support and stops IndexWriter flushes from blocking document additions, Solr will be even more ready to take advantage where it can. There will still be more to do – not everything Solr does is yet per segment, and replication is not currently very near-real-time friendly – but we will keeping moving things in the right direction.
I’m tackling these changes here: https://issues.apache.org/jira/browse/SOLR-2193
– Mark
Best of the Month. Straight to Your Inbox!
Dive into the best content with our monthly Roundup Newsletter!
Each month, we handpick the top stories, insights, and updates to keep you in the know.