The Apache Tika team just announced Tika 0.5 is released (c&p of announcement below). I upgraded Apache Solr’s Tika integration (aka Solr Cell) to use the new libraries this morning. To use, check out SVN trunk from Apache Solr.
The Apache Lucene project is pleased to announce the release of Apache Tika
0.5. The release contents have been pushed out to the main Apache release
site and the m2 ibiblio sync, so the releases should be available as soon as
the mirrors get the syncs.
Apache Tika, a subproject of Apache Lucene, is a toolkit for detecting and
extracting metadata and structured text content from various documents using
existing parser libraries.
Apache Tika 0.5 contains a number of improvements and bug fixes. Details can
be found in the changes file:
Apache Tika is available in source form from the following download page:
Apache Tika is also available in binary form or for use using Maven 2 from
the Central Maven Repositories:
In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads
using signatures found on the Apache site:
For more information on Apache Tika, visit the project home page: