The Search for Search at Reddit

Lucidworks Fusion boosts search for Reddit’s massive online community of 330 million monthly users.

Today, Reddit announced their new search for ‘the front page of the internet’ built with Lucidworks Fusion.

Started back in the halcyon Web 2.0 days of 2005, Reddit has become the fourth most popular site in the US and 9th in the world with more than 330 million users every month posting links, commenting and voting across their  1.1 million communities (called ‘sub-reddits’). Sub-reddits can orbit around such broad mainstream topics as /r/politics, /r/bitcoin, and /r/starwars or as obscure as /r/bunnieswithhats, /r/grilledcheese, and /r/animalsbeingjerks. Search is a key part of trying to find more information on their favorite topics and hobbies across the entire universe of communities.

As the site has grown, the search function has had five different search stacks implemented over the years including Postgres, PyLucene, Apache Solr, IndexTank, and Amazon’s CloudSearch. Each time performance got better but wasn’t keeping up with the pace of the site’s growth and relevancy wasn’t where it should be.

“When you think about the Internet, you think about a handful of sites — Facebook, Google, Youtube, and Reddit. My personal opinion is that Reddit is the most important of all of these,” explained Lucidworks CEO, Will Hayes. “It connects strangers from all over the world around an incredibly diverse group of topics. Content is created at a breakneck pace and at massive scale. Because of this, the search function becomes an incredibly important piece of the UX puzzle. Lucidworks Fusion allows Reddit to tackle the scale and complexity issues and provide the world-class search experience that their users expect. ”

The team chose Lucidworks Fusion for it’s best-in-class search capabilities including efficient scaling, monitoring, and improved search relevance.

“Reddit relies heavily on content discovery, as our primary value proposition is giving our people a home for discovering, sharing, and discussing the things they’re most passionate about,” said Nick Caldwell, Vice President of Engineering at Reddit. “As Reddit has grown, so have our communities’ expectations of the experience we provide, and improving our search platform will help us address a long-time user pain point in a meaningful way. We expect Fusion’s customization and machine learning functionality will significantly elevate our search capabilities and transform the way people discover content on the site.”

Here’s just a few of the results from the new search which is now at 100% availability to all users:

  • ETL indexing pipelines reduced to just 4 Hive queries, which led to a 33% increase in posts indexed
  • Full re-index of all of Reddit content slashed from 11 hours to 5 with constant live updates and errors down by two orders of magnitude
  • Amount of hardware/machines reduced from 200 to 30
  • 99% of queries served search results in 500ms
  • Comparable relevancy to the old search (without any fine-tuning yet!)

That’s just a little bit of the detailed blog post over on the Reddit blog. The Search for Better Search at Reddit.

Don’t miss their keynote at the Lucene/Solr Revolution next week in Las Vegas.

Coverage in TechCrunch and KMWorld. More on the way!

Read the full press release.

Go try out the search on Reddit right now!

You Might Also Like

B2B AI benchmark study 2025: What we’re seeing in the trenches

Download the 2025 B2B AI benchmark highlights from Lucidworks. See real data...

Read More

From search company to practical AI pioneer: Our vision for 2025 and beyond

CEO Mike Sinoway shares insights on AI's future, introducing Commerce Studio™ and...

Read More

When AI Goes Wrong: Real-World Fails and How to Prevent Them

Don’t let your AI chatbot sell a $50,000 Tahoe for $1! This...

Read More

Quick Links