Search-Time Parallelism at Etsy: An Experiment With Apache Lucene

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Shikhar Bhushan from Etsy’s experiments at Etsy with search-time parallelism.

Is it possible to gain the parallelism benefit of sharding your data into multiple indexes, without actually sharding? Isn’t your Lucene index already composed of shards i.e. segments? This talk will present an experiment in parallelizing Lucene’s guts: the collection protocol. An express goal was to try to do this in a lock-free manner using divide-and-conquer. Changes to the Collector API were necessary, such as orienting it to work at the level of child “leaf”-collectors so that segment-level state could be accumulated in parallel. I will present technical details that were learned along the way, such as how Lucene’s TopDocs collectors are implemented using priority queues and custom comparators. Onto the parallelizability of collectors — how some collectors like hit counting are embarrassingly parallelizable, how some like DocSet collection were a delightful challenge, and others where the space-time tradeoffs need more consideration. Performance testing results, which currently span from worse to exciting, will be discussed.

Shikhar works on Search Infrastructure at Etsy, the global handmade and vintage marketplace. He has contributed patches to Solr/Lucene, and maintains several open-source projects such as a Java SSH library and a discovery plugin for elasticsearch. He previously worked at Bloomberg where he delivered talks introducing developers to Python and internal Python tooling. He has a special interest in JVM technology and distributed systems.

http://www.slideshare.net/lucidworks/searchtime-parallelism-presented-by-shikhar-bhushan-etsy-41862845

lucenerevolution-avatar Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…4

The State of Generative AI in Global Business: 2025 Benchmark Report, Dawn of the Agentic AI Era

The first-of-its-kind study using autonomous AI agents to benchmark AI capabilities across...

How an electronics giant meets engineers where they are, with 44 million products in catalog

Meet Mohammad Mahboob: A search platform director navigating 44 million products across...

From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

Watch this on-demand webinar to discover the six smartest AI-driven DX strategies...

Search-Time Parallelism at Etsy: An Experiment With Apache Lucene

You Might Also Like

The State of Generative AI in Global Business: 2025 Benchmark Report, Dawn of the Agentic AI Era

How an electronics giant meets engineers where they are, with 44 million products in catalog

From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025