It looks like the Solr team might be gearing up for a release soon. That means Solr 1.4 is likely right around the corner, and that has me thinking about the new features we can look forward too. I figured I’d highlight some of the new stuff that I personally find especially interesting:
SOLR-560: Improved Logging
This lets you work with Solr with whatever logging implementation you are most familiar with. Very cool.
Use SLF4J logging API rather then JDK logging. The packaged .war file is shipped with a JDK logging implementation, so logging configuration for the .war should be identical to solr 1.3. However, if you are using the .jar file, you can select which logging implementation to use by dropping a different binding. See: http://www.slf4j.org/
SOLR-561: Pure Java Index Replication
A new replication implementation written in Java. This is a great addition to Solr and brings the simple replication that Unix users have taken for granted to Windows. Good stuff. Its not as battle tested as the old scripting/rsync solution, but its been used in production by a few people and has essentially gone through a strong beta period already. Anyone using replication for horizontal scaling should check this out.
Added Replication implemented in Java as a request handler. Supports index replication as well as configuration replication and exposes detailed statistics and progress information on the Admin page. Works on all platforms.
SOLR-284: Content Detection/Extraction with Tika
Apache Tika is a new Lucene sub project, and its looking very promising. Tika is a great content detection and extraction library that supports many popular formats:http://lucene.apache.org/tika/formats.html. This makes it a lot easier to pump most popular file types easily into Solr.
Added support for extracting content from binary documents like MS Word and PDF using Apache Tika.
SOLR-911: Multi-Select Faceting Support
Multi-select faceting support. Awesome. I’ve been seeing it more it more every day. Solr’s facet support continues to be excellent. Check out our use of multi-select at at www.lucidimagination.com/search.
Add support for multi-select faceting by allowing filters to be tagged and facet commands to exclude certain filters. This patch also added the ability to change the output key for facets in the response, and optimized distributed faceting refinement by lowering parsing overhead and by making requests and responses smaller.
SOLR-906: Buffered Updates With Solrj Over Http
More efficient index construction over http with solrj. If your doing it, this is a fantastic performance improvement.
Adding a StreamingUpdateSolrServer that writes update commands to an open HTTP connection. If you are using solrj for bulk update requests you should consider switching to this implementation. However, note that the error handling is not immediate as it is with the standard SolrServer.
SOLR-374: Index Reopen
Index reopening came to Lucene some time ago, and now comes to Solr. This means that when you add a couple documents to Solr, rather than opening the whole index again, only the one small segment is opened (subject to segment merging). A lot of work has gone on in Lucene development with reopen recently, and its going to be cool to see how Solr is able to take advantage of it all. Progress towards core real-time Lucene/Solr index/search is building.
Use IndexReader.reopen to save resources by re-using parts of the index that haven’t changed.
SOLR-475: Faceting Performance Boost
I havn’t used this first hand, but the reviews have been stellar. This should be a fantastic performance boost from what I hear.
New faceting method with better performance and smaller memory usage for multi-valued fields with many unique values but relatively few values per document. Controllable via the facet.method parameter – “fc” is the new default method and “enum” is the original method.
SOLR-84: New Solr Logo
Check out the new Solr logo. This was the winner of a community contest, and I think we really got a nice logo out of it. There were plenty of options to choose from, so it was a really successful contest
Use new Solr logo in admin
And then of course, there are a handful of goodies in the latest Lucene libraries that will affect Solr or lead to new Solr features shortly – and tons of other features, bug fixes, and performance improvements.