The Search for Search at Reddit

Today, Reddit announced their new search for ‘the front page of the internet’ built with Lucidworks Fusion.

Started back in the halcyon Web 2.0 days of 2005, Reddit has become the fourth most popular site in the US and 9th in the world with more than 330 million users every month posting links, commenting and voting across their  1.1 million communities (called ‘sub-reddits’). Sub-reddits can orbit around such broad mainstream topics as /r/politics, /r/bitcoin, and /r/starwars or as obscure as /r/bunnieswithhats, /r/grilledcheese, and /r/animalsbeingjerks. Search is a key part of trying to find more information on their favorite topics and hobbies across the entire universe of communities.

As the site has grown, the search function has had five different search stacks implemented over the years including Postgres, PyLucene, Apache Solr, IndexTank, and Amazon’s CloudSearch. Each time performance got better but wasn’t keeping up with the pace of the site’s growth and relevancy wasn’t where it should be.

reddit fusion search diagram

“When you think about the Internet, you think about a handful of sites — Facebook, Google, Youtube, and Reddit. My personal opinion is that Reddit is the most important of all of these,” explained Lucidworks CEO, Will Hayes. “It connects strangers from all over the world around an incredibly diverse group of topics. Content is created at a breakneck pace and at massive scale. Because of this, the search function becomes an incredibly important piece of the UX puzzle. Lucidworks Fusion allows Reddit to tackle the scale and complexity issues and provide the world-class search experience that their users expect. ”

The team chose Lucidworks Fusion for it’s best-in-class search capabilities including efficient scaling, monitoring, and improved search relevance.

“Reddit relies heavily on content discovery, as our primary value proposition is giving our people a home for discovering, sharing, and discussing the things they’re most passionate about,” said Nick Caldwell, Vice President of Engineering at Reddit. “As Reddit has grown, so have our communities’ expectations of the experience we provide, and improving our search platform will help us address a long-time user pain point in a meaningful way. We expect Fusion’s customization and machine learning functionality will significantly elevate our search capabilities and transform the way people discover content on the site.”

Here’s just a few of the results from the new search which is now at 100% availability to all users:

  • ETL indexing pipelines reduced to just 4 Hive queries, which led to a 33% increase in posts indexed
  • Full re-index of all of Reddit content slashed from 11 hours to 5 with constant live updates and errors down by two orders of magnitude
  • Amount of hardware/machines reduced from 200 to 30
  • 99% of queries served search results in 500ms
  • Comparable relevancy to the old search (without any fine-tuning yet!)

That’s just a little bit of the detailed blog post over on the Reddit blog. The Search for Better Search at Reddit.

Don’t miss their keynote at the Lucene/Solr Revolution next week in Las Vegas.

Coverage in TechCrunch and KMWorld. More on the way!

Read the full press release.

Go try out the search on Reddit right now!

You Might Also Like

How an electronics giant meets engineers where they are, with 44 million products in catalog

Meet Mohammad Mahboob: A search platform director navigating 44 million products across...

Read More

Protected: From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

There is no excerpt because this is a protected post.

Read More

How a B2B distribution giant uses smart search to navigate inflation, tariffs, and 10,000+ daily queries

Meet Ryan Finley: A 17-year search veteran who's turning enterprise search into...

Read More

Quick Links