Blog, Open Source, SearchHub, Technical Article

Solr Dev Diary: Solr and Near Real-Time Search

by Mark Miller
April 9, 2011

Solr’s UpdateHandler has gotten a little crusty. Many of the implementation details are there due to old, tired, and removed requirements and functions. For those that do not know, documents that you add to Solr are actually put into the index by the UpdateHandler.

There are two details about the current UpdateHandler implementation that are particularly limiting.

First, Solr uses it’s own lock’s on top of Lucene, adding a courser, unnecessary layer of locking on top of the IndexWriter. These locks had a reason to exist once upon a time, but really, they no longer do. There is no reason to block additional document adds while performing a commit, but currently this is what Solr does. Removing these locks will reduce complexity and maintenance costs by allowing us to ‘mostly’ just use Lucene’s locking. Solr will also more easily simply inherit improvements from Lucene in this area.

Second, because of historical requirements, Solr will close and open a new IndexWriter on every commit. This means that every commit waits for all background Index merging threads to finish merging. This can be a non insignificant amount of time – and during this time you cannot add any documents to the index. You also cannot see the documents that have just been added to the index until the merges and commit are complete. Really, the UpdateHandler should simply commit and open a new SolrIndexSearcher – with the background threads happily merging *in the background*.

There are a few other things that bug me as well.

Well I’m going to fix them all now. Time to remove the crust and introduce Lucene near-real-time support to Solr. You should be able to open a new view on recently added content with Solr in a fraction of the time possible right now. It’s not right that you have to juggle SolrCore’s to attempt near real time index updates – it’s time to make things easier. Time to makes things faster.

And when Lucene finishes it’s real-time support and stops IndexWriter flushes from blocking document additions, Solr will be even more ready to take advantage where it can. There will still be more to do – not everything Solr does is yet per segment, and replication is not currently very near-real-time friendly – but we will keeping moving things in the right direction.

I’m tackling these changes here: https://issues.apache.org/jira/browse/SOLR-2193

– Mark

Lucidworks Platform Overview

Lucidworks Platform Pricing

AI Hub

Lucidworks Features and capabilities | Lucidworks Studios

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Search Path

Analytics Studio

Commerce Studio

Solutions

Commerce

Customer Service

Knowledge Management

Industries, Package & Service Offerings

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

B2B Core Package

B2C Core Package

Customer Service Offerings

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Search Path

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

About Mark Miller

Related Articles