Today Apache Lucene and Solr PMC announced their V4.2 release.
- A read side REST API for the schema. Always wanted to introspect the schema over http? Now you can. Looks like the write side will be coming next.
- DocValues have been integrated into Solr. DocValues can be loaded up a lot faster than the field cache and can also use different compression algorithms as well as in RAM or on Disk representations. Faceting, sorting, and function queries all get to benefit. How about the OS handling faceting and sorting caches off heap? No more tuning 60 gigabyte heaps? How about a snappy new per segment DocValues faceting method? Improved numeric faceting? Sweet.
- Collection Aliasing. Got time based data? Want to re-index in a temporary collection and then swap it into production? Done. Stay tuned for Shard Aliasing.
- Collection API responses. The collections API was still very new in 4.0, and while it improved a fair bit in 4.1, responses were certainly needed, but missed the cut off. Initially, we made the decision to make the Collection API super fault tolerant, which made responses tougher to do. No one wants to hunt through logs files to see how things turned out. Done in 4.2.
- Interact with any collection on any node. Until 4.2, you could only interact with a node in your cluster if it hosted at least one replica of the collection you wanted to query/update. No longer – query any node, whether it has a piece of your intended collection or not and get a proxied response.
- Allow custom shard names so that new host addresses can take over for retired shards. Working on Amazon without elastic ips? This is for you.
- Lucene 4.2 optimizations such as compressed term vectors.
Solr 4.2 also includes many other new features as well as numerous optimizations and bugfixes.
- Lucene 4.2 has a new default codec (Lucene42Codec) with a more efficient docvalues format (sorted bytes in FST, less addressing overhead, improved numeric compression) and smaller term vectors (LZ4-compressed terms dictionaries and payloads, delta-encoded positions and offsets using blocks of packed integers).
- Doc values external and codec API and implementations have been simplified: the codec is no longer responsible for buffering doc values; the numerous types have been consolidated down to only three (NUMERIC, BINARY, SORTED); PerFieldDocValuesFormat lets you set a different format for each field, and the doc values and FieldCache APIs were unified.
- Significant refactoring and performance enhancements to the facet module, resulting in overall ~3.8X speedup in one case (single Date field faceting).
- DrillDownQuery in the facet module now supports multi-select.
- A new DrillSideways class enables counting facet labels and counts for both hits and near-misses in a single query. See http://blog.mikemccandless.com/2013/02/drill-sideways-faceting-with-lucene.html
- An additional docvalues type (SORTED_SET) was added that supports multiple values.
- FSTs are a bit smaller, and the FST package supports FSTs over 2GB in size.
- A new LiveFieldValues class lets you get live or real-time values for any indexed doc / field. See http://blog.mikemccandless.com/2013/01/getting-real-time-field-values-in-lucene.html
- Added a new classification module.
- Various bugfixes and optimizations since the 4.1 release.
Apache Solr 4.2 can be downloaded here.
Apache Lucene 4.2 library can be downloaded here.
Note that the mirrors are only beginning to update so not all of them may contain the v4.2 version of Lucene and Solr.