In the Lucene/Solr Revolution session, “Text Classification with Lucene/Solr, Apache Hadoop and LibSVM,” Majirus Fansi, SOA and Search Engine Developer at Valtech, will show you how to build a text classifier using Apache Lucene/Solr with libSVM libraries. They classify their corpus of job offers into a number of predefined categories. Each indexed document (a job offer) then belongs to zero, one or more categories. Known machine learning techniques for text classification include naïve bayes model, logistic regression, neural network, support vector machine (SVM), etc..
They use Lucene/Solr to construct the features vector. Then they use the libsvm library, known as the reference implementation of the SVM model, to classify the document. They construct as many one-vs-all svm classifiers as there are classes in their setting. Then using the Hadoop MapReduce Framework, they reconcile the result of the classifiers. The end result is a scalable multi-class classifier. Finally they outline how the classifier is used to enrich basic Solr keyword search.
This intermediate level session will take place from 1:55-2:40 on Thursday, November 7. Click here for more details.
About the Speaker:
Majirus Fansi (@majirus) is lead developer at Valtech Technology Paris. He integrates search features based on Apache Lucene/Solr into his clients’ Java Web applications. Majirus is also SOA integrator, helping his clients integrate MULE ESB into their architecture. He speaks at academic conferences and developers’ meetings such as Devoxx France, ApacheCon, and Lucene/Solr Revolution. His main focus today is applying text mining to extend his clients’ keyword search applications with semantic features. Majirus holds a PhD in computer science from the University of Pau in France and a joint Executive MBA degree from Stockholm University School of Business and Ecole Supérieure de Commerce (ESC) de Pau.
- For more information about Lucene/Solr Revolution EU, visit lucenerevolution.org.
- For more Road to Revolution posts, click here.
- To view the full session agenda, click here.
- To register for the conference, click here.
- To get the latest conference news and updates, follow @LuceneSolrRev on Twitter.
- Do you have a question about the conference? Do you want to be added to the conference mailing list? Are you interested in sponsoring Revolution? If so, please email us at: firstname.lastname@example.org.
Lucene/Solr Revolution is presented by Lucidworks, the commercial entity for Apache Lucene/Solr open source search — the future of search technology.
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.