Apache Lucene, Apache Solr, Open Source, SearchHub

Real-time Search With Lucene

by Mark Miller
April 10, 2009

Real-time search is kind of a fuzzy concept, but basically it means dropping the time a modification to an index takes to be seen by users to a near negligible quantity – or a small enough time difference to be acceptable for a given real-time application. Not all applications need real-time search, but the type of application that does need it is very popular these days – social networking sites. The average social networking site would like user changes to be search-able almost immediately. When it comes to Lucene, this type of rapid update application has required you to jump through quite a few hoops and accept more than a few compromises. The future looks a bit more rosy though.

The longer term hope for real-time search in Lucene has been to create an IndexReader that can read the un-flushed state that IndexWriter holds in RAM. Easier said than done though. What is actually materializing at this time is a slightly different approach – as soon as Lucene 2.9, you will be able to ask for an IndexReader from a live IndexWriter. One of the guys working on this (Lucene guru Mike McCandless) calls this ‘near real-time’ search. Briefly, it works like this (note: I am not working on this issue, and do not know it in depth – just following along):

When you ask for the IndexReader from the IndexWriter, the IndexWriter will be flushed (docs accumulated in RAM will be written to disk) but not committed (fsync files, write new segments file, etc). The returned IndexReader will search over previously committed segments, as well as the new, flushed but not committed segment. Because flushing will likely be processor rather than IO bound, this should be a process that can be attacked with more processor power if found to be too slow. Also, deletes are carried in RAM, rather than flushed to disk, which may help in eeking a bit more speed. The result is that you can add and remove documents from a Lucene index in ‘near’ real time by continuously asking for a new Reader from the IndexWriter every second or couple seconds. I haven’t seen a non synthetic test yet, but it looks like its been tested at around 50 documents updates per second without heavy slowdown (eg the results are visible every second). The patch takes advantage of LUCENE-1483, which keys FieldCaches and Filters at the individual segment level rather than at the index level – this allows you to only reload caches per segment rather then per index – essential for real-time search with filter/cache use.

I can’t wait to see this work start creeping into Solr.

Lucidworks Platform Overview

Lucidworks Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Search Path

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Search Path

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

Real-time Search With Lucene

About Mark Miller

Lucidworks Platform Overview

Lucidworks Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Search Path

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Search Path

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

About Mark Miller

Related Articles