Today, Reddit announced their new search for ‘the front page of the internet’ built with Lucidworks Fusion.
Started back in the halcyon Web 2.0 days of 2005, Reddit has become the fourth most popular site in the US and 9th in the world with more than 300 million users every month posting links, commenting and voting across their 1.1 million communities (called ‘sub-reddits’). Sub-reddits can orbit around such broad mainstream topics as /r/politics, /r/bitcoin, and /r/starwars or as obscure as /r/bunnieswithhats, /r/grilledcheese, and /r/animalsbeingjerks. Search is a key part of trying to find more information on their favorite topics and hobbies across the entire universe of communities.
As the site has grown, the search function has had five different search stacks implemented over the years including Postgres, PyLucene, Apache Solr, IndexTank, and Amazon’s CloudSearch. Each time performance got better but wasn’t keeping up with the pace of the site’s growth and relevancy wasn’t where it should be.
“When you think about the Internet, you think about a handful of sites — Facebook, Google, Youtube, and Reddit. My personal opinion is that Reddit is the most important of all of these,” explained Lucidworks CEO, Will Hayes. “It connects strangers from all over the world around an incredibly diverse group of topics. Content is created at a breakneck pace and at massive scale. Because of this, the search function becomes an incredibly important piece of the UX puzzle. Lucidworks Fusion allows Reddit to tackle the scale and complexity issues and provide the world-class search experience that their users expect. ”
The team chose Lucidworks Fusion for it’s best-in-class search capabilities including efficient scaling, monitoring, and improved search relevance.
“Reddit relies heavily on content discovery, as our primary value proposition is giving our people a home for discovering, sharing, and discussing the things they’re most passionate about,” said Nick Caldwell, Vice President of Engineering at Reddit. “As Reddit has grown, so have our communities’ expectations of the experience we provide, and improving our search platform will help us address a long-time user pain point in a meaningful way. We expect Fusion’s customization and machine learning functionality will significantly elevate our search capabilities and transform the way people discover content on the site.”
Here’s just a few of the results from the new search which is now at 100% availability to all users:
- ETL indexing pipelines reduced to just 4 Hive queries, which led to a 33% increase in posts indexed
- Full re-index of all of Reddit content slashed from 11 hours to 5 with constant live updates and errors down by two orders of magnitude
- Amount of hardware/machines reduced from 200 to 30
- 99% of queries served search results in 500ms
- Comparable relevancy to the old search (without any fine-tuning yet!)
That’s just a little bit of the detailed blog post over on the Reddit blog. The Search for Better Search at Reddit.