[UPDATE] Spatial Search in Apache Lucene and Solr

One of the most frequent things I get asked is “what is the state of spatial in Lucene and Solr?”  So here is my answer as of today:

  1. I just committed SOLR-1568 the other day, which adds automatic filter generation to the various point based Field Types in Solr.  It also has some small refactoring in the underlying Lucene code.  Furthermore, it adds a new LatLonType which can be used to represent latitude/longitude pairs seamlessly.  See http://wiki.apache.org/solr/SpatialSearch for the full details on Solr spatial.  Note, this is only available on trunk.  Volunteers to backport to 3.x would be most welcome.
  2. As part of SOLR-1568, it became increasingly clear to me that the Cartesian Tier stuff in Lucene spatial simply does not work as intended for many, many things.  In my review and attempt at fixing the code, it became more than apparent that it only really works for the Western Hemisphere above the equator, i.e. the United States.  It may also work in the Eastern Hemisphere above the equator, too.  The reason it only really works above the equator is due to a miscalculation in the SinusoidalProjector.  See LUCENE-2519.  It also does not handle edge cases well at all, such as at the poles or the Prime/Anit Meridians, so if you have that case, then don’t bother.  I didn’t fix the SinusoidalProjector because it turned into a very tangled web of broken unit tests.  In discussions with other developers, we decided the whole tier system (and much of Lucene’s spatial should be deprecated/replaced).

I believe trunk is now in pretty decent shape for spatial search for applications that need:

  1. Sorting by distance
  2. Boosting by distance
  3. Range-query (using Numeric Fields) based bounding box calculations, which should be sufficient for most people
  4. Geohash based calculations

Trunk does not yet have the ability to:

  1. add “pseudo” fields to the result set, so it is not possible to include the distance in the result set just like other stored fields
  2. A tier/tile/grid based approach to filtering.  These approaches are especially helpful in highly dense areas as they can significantly reduce the number of terms that need to be enumerated
  3. Faceting by functions, which can be useful for putting distances into buckets, as in something like: walking, biking, driving

For a list of all the related Solr/Lucene spatial issues, see SOLR-773.  Again, see http://wiki.apache.org/solr/SpatialSearch for a full accounting of what is in Solr and how to use it.

In summary, I think trunk is in pretty decent shape for spatial, as far as Solr is concerned.  Pure Lucene users will seem some upheaval in the coming months, but it is for the better.  Testers are needed and patches are welcome.  And, while the tier stuff feels like a step backward, I think it is clear to me that we have several committers along with many contributors who are very interested in seeing spatial support live and prosper.

You Might Also Like

Protected: From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

There is no excerpt because this is a protected post.

Read More

Build custom AI agents without writing a single line of code? Yep, we did that.

Finally, a low-code AI platform (really, no code) that lets the people...

Read More

How a B2B distribution giant uses smart search to navigate inflation, tariffs, and 10,000+ daily queries

Meet Ryan Finley: A 17-year search veteran who's turning enterprise search into...

Read More

Quick Links