As we countdown to the annual Lucene/Solr Revolution conference in Boston this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting George Bailey and Cameron Baker’s talk, “Rackspace Email’s solution for indexing 50k documents per second”.
Customers of Rackspace Email have always needed the ability to search and review login, routing, and delivery information for their emails. In 2008, a solution was created using Hadoop MapReduce to process all of the logs from hundreds of servers and create Solr 1.4 indexes that would provide the search functionality. Over the next several years, the number of servers generating the log data grew from hundreds to several thousands which required the cluster of Hadoop and Solr 1.4 servers to grow to ~100 servers. This growth caused the MapReduce jobs for indexing the data to take anywhere from 20 minutes to several hours.
In 2015, Rackspace Email set out to solve this ever growing need to index and search billions of events from thousands of servers and decided to leverage SolrCloud 5.1. This talk covers how Rackspace replaced over ~100 physical servers with 10 and improved functionality to allow for documents to be indexed and searchable within 5 seconds.
George Bailey is a Software Developer for Rackspace Email Infrastructure.
Cameron Baker is a Linux Systems Engineer for Rackspace Email Infrastructure.
Join us at Lucene/Solr Revolution 2016, the biggest open source conference dedicated to Apache Lucene/Solr on October 11-14, 2016 in Boston, Massachusetts. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…