Text classification automates the task of filing documents into pre-defined categories based on a set of example documents. The first step in automating classification is to transform the documents to feature vectors. Though this step is highly domain specific, Apache Mahout provides you with a lot of easy to use tooling to help you get started, most of which relies heavily on Apache Lucene for analysis, tokenisation, and filtering.

The session, “Text Classification Powered by Apache Mahout and Lucene,” by Isabel Drost-Fromm, Software Developer at Apache Software Foundation/Nokia Gate 5, shows how to use faceting to quickly get an understanding of the fields in your document. Isabel will walk you through the steps necessary to convert your text documents into feature vectors that Mahout classifiers can use, including a few anecdotes on drafting domain specific features.

This introductory level session will take place from 11:55-12:40 on Thursday, November 7. Click here for more details.

About the Speaker:

Isabel Drost-Fromm (@MaineC) is member of the Apache Software Foundation. She is founder of the Berlin Buzzwords Conference and the Apache Hadoop Get Together in Berlin, and co-organised the first European NoSQL meetup. She co-founded Apache Mahout and is an Apache Mahout committer. Isabel is actively engaged with communities of various Apache projects, e.g., Lucene and Hadoop. She is a regular speaker at conferences on topics related to free software development, scalability, big data, Hadoop, and Mahout. She currently works for Nokia Gate 5 GmbH as a Software Developer.

More Details:

  • For more information about Lucene/Solr Revolution EU, visit lucenerevolution.org.
  • For more Road to Revolution posts, click here.
  • To view the full session agenda, click here.
  • To register for the conference, click here.
  • To get the latest conference news and updates, follow @LuceneSolrRev on Twitter.
  • Do you have a question about the conference? Do you want to be added to the conference mailing list? Are you interested in sponsoring Revolution? If so, please email us at: info@lucenerevolution.org.

Lucene/Solr Revolution is presented by Lucidworks, the commercial entity for Apache Lucene/Solr open source search — the future of search technology.