The Apache Tika team just announced Tika 0.5 is released (c&p of announcement below).  I upgraded Apache Solr’s Tika integration (aka Solr Cell) to use the new libraries this morning.  To use, check out SVN trunk from Apache Solr.

The Apache Lucene project is pleased to announce the release of Apache Tika 0.5. The release contents have been pushed out to the main Apache release site and the m2 ibiblio sync, so the releases should be available as soon as the mirrors get the syncs.

Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.

Apache Tika 0.5 contains a number of improvements and bug fixes. Details can be found in the changes file:

Apache Tika is available in source form from the following download page:

Apache Tika is also available in binary form or for use using Maven 2 from the Central Maven Repositories:

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads using signatures found on the Apache site:

For more information on Apache Tika, visit the project home page:

About Grant Ingersoll

Read more from this author


Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.