Contrived FieldCache Load Test: Lucene 2.4 VS Lucene 2.9

*edit* Sorry – jumped the gun with my original test code here – need to close the IndexWriter after the optimize! The gains are only with multi segment indexes. Corrected entry follows:

Lets do a little test. We will load up a FieldCache with 5,000,000 unique strings and see how long it takes Lucene 2.4 in comparison to Lucene 2.9.

Lets use my quad core laptop and the following test code:

public class ContrivedFCTest extends TestCase {
  public void testLoadTime() throws Exception {
    Directory dir = FSDirectory.getDirectory(System.getProperty("java.io.tmpdir") + File.separator + "test");
    IndexWriter writer = new IndexWriter (dir, new SimpleAnalyzer(), true, IndexWriter.MaxFieldLength.LIMITED);
    writer.setMergeFactor(37);
    writer.setUseCompoundFile(false);
    for(int i = 0; i < 5000000; i++) {
      Document doc = new Document();
      doc.add (new Field ("field",  "String" + i, Field.Store.NO, Field.Index.NOT_ANALYZED));
      writer.addDocument(doc);
    }
    writer.close();

    IndexReader reader = IndexReader.open(dir);
    long start = System.currentTimeMillis();
    FieldCache.DEFAULT.getStrings(reader, "field");
    long end = System.currentTimeMillis();
    System.out.println("load time:" + (end - start)/1000.0f + "s");
  }
}

The results?

Lucene 2.4: 150.726s
Lucene 2.9: 9.695s

We discovered early this year that in the past, Lucene has been terribly inefficient when loading FieldCaches over multiple segments. Lucene 2.9 addresses this at the MultiReader level (thank you Yonik!). Also, internal FieldCache usage is now per segment, which sidesteps loading FieldCaches over mutiple segments all together – each segment has its own FieldCache.

Share the knowledge

Agentic AI and the Rise of Protocols: Where the Ecosystem Is Headed Next

n 2025, we’re moving fast toward a new paradigm in AI: agents...

MCP and Context Windows: Why Protocols Matter More Than Bigger LLMs

Over the last year, the race to expand LLM context windows has...

How MCP Can Improve AI-Powered Search and Discovery

In the era of generative AI, search is no longer a passive...

Contrived FieldCache Load Test: Lucene 2.4 VS Lucene 2.9

You Might Also Like

Agentic AI and the Rise of Protocols: Where the Ecosystem Is Headed Next

MCP and Context Windows: Why Protocols Matter More Than Bigger LLMs

How MCP Can Improve AI-Powered Search and Discovery