Apache Solr, Blog, Events and Meetups, Open Source, SearchHub

Simon Willnauer, Column Stride Fields or DocValues, and improving on FieldCache

by Lucidworks
May 31, 2011

I’m a web application developer. I am not going to pretend that I understood everything that Simon Willnauer was saying at his talk on Column Stride Fields, or DocValues, on day 1 of Lucene Revolution. That’s because, as a rule, I don’t have a need to climb into the guts of Lucene in order to get good results. And that’s exactly what I like about it. If you tell me, “under these circumstances, define your field as ‘fieldable’ to get the best performance” that’s good enough for me.

Slides for this session:

But for many people — particularly the types of people who attend Lucene Revolution — Lucene’s guts are where it’s at. And this talk was definitely for them. Fortunately, Simon’s talk was recorded on video, so you can get the uninterpreted details from him as soon as that’s available, but even with my limited grasp, I could tell this was a Good Thing.

Basically, it comes to this, as I understand it. Lucene creates an inverted index, which basically just links a term to a document. But once you have the terms, you have to score those documents in terms of relevance, and to do that, you need access to the data that’s part of that document.

There are two ways to get at Lucene’s indexed data. One way is through stored fields, and the other is through the FieldCache. Stored fields can be slow because you basically have to do two seeks: one to find out what file to look in for it, and then another to actually find the field. FieldCache is faster, because it’s an inverted index that lives in memory, but it still has to be loaded, and then it can take up a lot of memory — more than you may have available, especially if you’re on a mobile device, or limited to a 2GB heap — once it is.

Now, DocValues (not, in this case, as Simon pointed out, the existing DocValues class — this feature will likely be renamed before it’s released) are basically an array, similar to FieldCache. This array can be loaded into RAM, or it can be stored on disk for sequential access, which basically makes no demands on the heap. (He does recommend that you store it in a MemoryMapped buffer for best performance.) As each field needs its own file, however, you will want to watch for “too many files open” errors.

The performance benchmarks he showed were impressive; clearly this will be a huge step forward.

Next, Simon wants to make DocValues updateable, which will be great for scoring based on changing values, for distributed search, and of course for real time search.

Sounds good to me.

Cross-posted with Lucene Revolution Blog; Nicholas Chase is a guest blogger. This is one of a series of presentation summaries from the conference.

About Lucidworks

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.

Fusion Platform Overview

Fusion Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

Simon Willnauer, Column Stride Fields or DocValues, and improving on FieldCache

About Lucidworks

LEARN MORE

Fusion Platform Overview

Fusion Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

About Lucidworks

Related Articles

LEARN MORE