The Lucene PMC has issued an important security announcement regarding how some Solr users may be vulnerable to recently fixed exploits (CVE-2014-3529 and CVE-2014-3574) in Apache POI — An open source library for parsing Microsoft file formats.

These exploits may impact any Solr users who enable the ExtractingRequestHandler (aka: “Solr Cell“) to parse files from untrusted sources. A maliciously crafted OpenXML file could consume excessive computing resources resulting in a DoS attack, or expose sensitive details in files accessible to the effective runtime user of the Solr server.

“Hot Fix” instructions to upgrade the affected Apache POI Jar files in Solr 4.8.0, 4.8.1, and 4.9.0 have been posted on the Solr website. A new version of Solr will be released in the next few weeks including the fixed jars as well.

Full details from the Lucene PMC announcement email

Date: Tue, 19 Aug 2014 01:33:55 +0200
Subject: [ANNOUNCE] [SECURITY] Recommendation to update Apache POI in Apache Solr 4.8.0, 4.8.1, and 4.9.0 installations

Hallo Apache Solr Users,

the Apache Lucene PMC wants to make the users of Solr aware of  the following issue:

Apache Solr versions 4.8.0, 4.8.1, 4.9.0 bundle Apache POI 3.10-beta2 with its binary release tarball. This version
(and all previous ones) of Apache POI are vulnerable to the following issues:

= CVE-2014-3529: XML External Entity (XXE) problem in Apache POI's OpenXML parser =
Type: Information disclosure
Description: Apache POI uses Java's XML components to parse OpenXML files produced by Microsoft Office products
(DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML External Entity
(XXE) attacks, which allows remote attackers to bypass security restrictions and read arbitrary files via a crafted
OpenXML document that provides an XML external entity declaration in conjunction with an entity reference.

= CVE-2014-3574: XML Entity Expansion (XEE) problem in Apache POI's OpenXML parser =
Type: Denial of service
Description: Apache POI uses Java's XML components and Apache Xmlbeans to parse OpenXML files produced by Microsoft
Office products (DOCX, XLSX, PPTX,...). Applications that accept such files from end-users are vulnerable to XML
Entity Expansion (XEE) attacks ("XML bombs"), which allows remote hackers to consume large amounts of CPU resources.

The Apache POI PMC released a bugfix version (3.10.1) today.

Solr users are affected by these issues, if they enable the "Apache Solr Content Extraction Library (Solr Cell)"
contrib module from the folder "contrib/extraction" of the release tarball.

Users of Apache Solr are strongly advised to keep the module disabled if they don't use it. Alternatively, users of
Apache Solr 4.8.0, 4.8.1, or 4.9.0 can update the affected libraries by replacing the vulnerable JAR files in the
distribution folder. Users of previous versions have to update their Solr release first, patching older versions is
impossible.

To replace the vulnerable JAR files follow these steps:

- Download the Apache POI 3.10.1 binary release: http://poi.apache.org/download.html#POI-3.10.1
- Unzip the archive
- Delete the following files in your "solr-4.X.X/contrib/extraction/lib" folder:
        # poi-3.10-beta2.jar
        # poi-ooxml-3.10-beta2.jar
        # poi-ooxml-schemas-3.10-beta2.jar
        # poi-scratchpad-3.10-beta2.jar
        # xmlbeans-2.3.0.jar
- Copy the following files from the base folder of the Apache POI distribution to the "solr-4.X.X/contrib/extraction/lib" folder:
        # poi-3.10.1-20140818.jar
        # poi-ooxml-3.10.1-20140818.jar
        # poi-ooxml-schemas-3.10.1-20140818.jar
        # poi-scratchpad-3.10.1-20140818.jar
- Copy "xmlbeans-2.6.0.jar" from POI's "ooxml-lib/" folder to the "solr-4.X.X/contrib/extraction/lib" folder.
- Verify that the "solr-4.X.X/contrib/extraction/lib" no longer contains any files with version number "3.10-beta2".
- Verify that the folder contains one xmlbeans JAR file with version 2.6.0.

If you just want to disable extraction of Microsoft Office documents, delete the files above and don't replace them.
"Solr Cell" will automatically detect this and disable Microsoft Office document extraction.

Coming versions of Apache Solr will have the updated libraries bundled.

Happy Searching and Extracting,
The Apache Lucene Developers

PS: Thanks to Stefan Kopf, Mike Boufford, and Christian Schneider for reporting these issues!