Given the tremendous explosion in sheer quantities of data and the broad interest in new ways to manage datastore-side problems, this course offers a timely and useful introduction to the key complementary technologies you need to know about to bust open new frontiers with your search applications. Over 2 days, the course covers:
- An overview of Apache Hadoop
- Understanding MapReduce
- Principles of Hadoop development, operations & eco-system
- How to use Hadoop with Solr
- How to Index large volumes of data
- How to effectively search large indexes
- Understanding NoSQL
- How to shard/federate/replicate your data for large indexes
- Understanding resources cost & tradeoffs for Solr Features
If you’re interested in getting better acquainted with this topic, I’d recommend you take a look at the recent very useful article by Ken Krugler on Dzone’s Solr/Lucene Microzone:
The interesting thing about combining these two open source projects [i.e., Solr/Lucene and Hadoop] is that you can use Hadoop to crunch the data, and then serve it up in Solr. And we’re not talking about just free-text search; Solr can be used as a key-value store (i.e. a NoSQL database) via its support for range queries.
Even on a single server, Solr can easily handle many millions of records (“documents” in Lucene lingo). Even better, Solr now supports sharding and replication via the new, cutting-edge SolrCloud functionality.
The article explains where these technologies mash together usefully, and even more useful, it provides some examples and working example code. That should whet your appetite.
Of course, there’s much more to Solr/Lucene than just Hadoop — so we recommend you sign up for both the course and the Lucene Revolution conference; purchased together, it can save you more than $300 (a big deal for big data).