As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Rahul Jain’s session on indexing large scale SEO/SEM data. Search engine optimization (SEO) is the process of affecting the visibility of a website or a web page in a search engine’s natural or un-paid (organic) search results while other side Search engine marketing (SEM) is a form of Internet marketing that involves the promotion of websites by increasing their visibility in search engine results pages (SERPs) through optimization and advertising. We are working on building a SEO/SEM application where an end user search for a keyword or a domain and gets all the insights about these including Search engine ranking, CPC/CPM, search volume, No. of Ads, competitors details etc. in a couple of seconds. To have this intelligence, we get huge web data from various sources and after intensive processing it is as much as 40 billion records/month in MySQL database with 4.6 TB compressed index data in Apache Solr. Due to large volume, we faced several challenges while improving indexing performance, search latency and scaling the overall system. In this session, I will talk about our several design approaches to import data faster from MySQL, tricks & techniques to improve the indexing performance, Distributed Search, DocValues(life saver), Redis and the overall system architecture.”
Rahul Jain is a Freelance Big Data/Search Consultant from Hyderabad, India where he helps organizations in scaling their big-data/search applications. He has 7 years of experience in development of Java and J2EE based distributed systems with 2 years of experience in working with Big data technologies (Apache Hadoop/Spark) and Search/IR systems (Lucene/Solr/Elasticsearch). In his previous assignments, he was associated with Aricent Technologies and Wipro Technologies Ltd, in Bangalore where he worked on development of multiple products. He is a frequent speaker and had given several talks/presentations on multiple topics in Search/IR domain at various meetup/conferences.
Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.