The Lucene/Solr Revolution session, “Large Scale Crawling with Apache Nutch and Friends,” by Julien Nioche, Director at DigitalPebble, will give an overview of Apache Nutch. Julien will describe its main components and how it fits with other Apache projects, such as Hadoop, SOLR, Tika, and HBase. The second part of the presentation will be focused on the latest developments in Nutch, the differences between the 1.x and 2.x branch, and what we can expect to see in Nutch in the future.
This session will cover many practical aspects and should be a good starting point to crawling on a large scale with Apache Nutch and Solr.
This intermediate level session will take place from 3:40-4:25 on Wednesday, November 6. Click here for more details.
About the Speaker:
Julien Nioche is the founder of DigitalPebble Ltd, a consultancy based in Bristol, UK. He specializes in Web Crawling, Natural Language Processing, Machine Learning and Information Retrieval with a strong expertise in open source solutions. Julien is the VP for Apache Nutch, a committer on Tika and Gora, and a contributor to several other open source projects.
- For more information about Lucene/Solr Revolution EU, visit lucenerevolution.org.
- For more Road to Revolution posts, click here.
- To view the full session agenda, click here.
- To register for the conference, click here.
- To get the latest conference news and updates, follow @LuceneSolrRev on Twitter.
- Do you have a question about the conference? Do you want to be added to the conference mailing list? Are you interested in sponsoring Revolution? If so, please email us at: firstname.lastname@example.org.
Lucene/Solr Revolution is presented by Lucidworks, the commercial entity for Apache Lucene/Solr open source search — the future of search technology.