Zventsaggregates and distributes local content about “things to do” — so users can discover events, entertainment, restaurants etc., based on time and place. With two-level ranking and cache optimizations that improve both utilization and user experience, Zvents has achieved key innovations with Solr that deliver both optimal performance and more relevant results.
by Amit Nithian, Ivan Small, and Tony Barreca
Founded about 4 years ago and headquartered in San Mateo, California, Zvents aggregates and distributes local event content, ranging from major rock concerts to kindergarten class reunions. Our company operates as a syndicated Web-services provider to a network of media outlets and other private companies, community organizations, public institutions like schools and universities, and events, venues among others. Besides events, you can find movies and restaurants, and what performers are coming to town. For advertisers, there are two ways to promote events: one is by creating a free listing and the other, by upgrading that free listing to a “premium listing” allowing for these listings with enhanced content (e.g. more pictures) an opportunity to appear in a more exclusive “sponsored” section of the search result page.
From a user’s point-of-view, Zvents is first and foremost a “discovery” site. As such, its search capability is a core functional requirement.
Zvents search functionality consists of custom code layered on top of the Lucene/Solr core; our company has invested significant resources in refining our search techniques and algorithms. At the root of our approach, however, are standard Lucene/Solr capabilities like filtering and faceting which are then combined with sophisticated Zvents extensions such as a multi-layered caching subsystem to support dynamic re-ranking.
Because search is so central and critical to our business success, we have invested a significant amount of time in thinking about and improving these capabilities, starting with a clear grasp of what is needed to succeed with users in the market for local search.
From the outset, the Zvents search system needed some functionality not included in an “out-of-the-box” deployment of Lucene/Solr — namely, support for optimal searching by time and distance, along with improved query caching to support faceting and general discovery.
To really hit the sweet spot of the user experience, our search needed to be fast and reliable, a minimum requirement that’s hardly surprising and pretty available in “vanilla” Lucene/Solr. Getting the flexibility and precision in relevance ranking that is required to ensure users are delighted enough to come back again and again turned out to be a little more complicated. Our system uses custom filtering and has developed a multi-layered caching system that treats caches as higher-order constructs in which different layers have different properties.
As mentioned above, Zvents is a “discovery” site, in exactly the sense that it helps users discover the answer to the question, “What is there to do tonight (or whenever)?” It’s also structured to make it easy for a user to get answers to subsidiary questions like, “What’s happening within a 10 mile radius of my home?” and “Is there a good Italian restaurant near the cinema I’m going to this weekend?”
In practice, our approach to search—i.e., to answering such questions—is tightly integrated and largely self-optimizing in key respects. For the purposes of this write-up, however, we break down the essentially unitary structure of the search application set into three logically, but not functionally, independent strands.
Distance Weighting- Secondary Ranking
To ensure a high cache hit rate, Zvents search is powered on a scheme of two level ranking, with the first layer relying on phrase and document boosting using standard Solr caching (with some Zvents optimizations). The secondary layer further refines the results by taking the top one thousand hits and re-ranks based on distance and other criteria to produce a final document set.
We implemented this by extending the standard
DisMaxRequestHandler and injecting the necessary logic to grab the top X hits and re-rank as necessary. This two-level ranking ensures that the static portion of the search query—primarily, the text, time and categories—stay cached at the primary layer while the more dynamic portion of the search—e.g., distance weighting—can be handled and cached at the secondary layer.
Our multi-layered caching scheme supports, at a minimum, primary and secondary caches. Initial relevance ranking may be parameterized to include, for example, all the events in a metropolitan area or all events of a particular type, e.g., jazz performances. All matches are stored in the primary cache, but only the top-ranking ones are placed into the secondary cache. To ensure maximum caching performance at the primary layer, the top X (about a thousand) hits are re-ranked according to distance and other metrics. Re-ranking is performed only in the secondary caches to further refine results based on parameters like “current distance”.
Because only a small amount of the top hits returned by the primary ranking layer are re-ranked, this helps Zvents achieve very high performance of updates; the much larger primary cache is re-ranked much less frequently. In other words, the Zvents re-ranking scheme allows them to provide users with optimized results without imposing a perceivable lag in response time.
Caching Optimizations and low level de-duplication
Zvents search heavily relies on the document boost function feature that Solr’s ranking provides. However, one of the biggest optimizations to this feature is the ability to cache the result of all the boosting functions. This improves memory usage and reduces costly function computation by storing one boost value per document. This level of caching quickly enabled the implementation of new features such as the careful random promotion of lower ranked documents, which allows us to measure the trade-offs between exploration and exploitation of documents.
A unique problem in local search is dealing with repeating events and individual movie show times each of which are stored as separate documents in the index since each occurrence has a separate time and place (in the case of movies). However, to prevent duplicate data from dominating the search results, we made a de-duplication modification in the lower levels of Solr to ensure that the top X results sent to the secondary ranking layer are unique with respect to a particular known field (‘parent’, in this case). This protects users from having to wade through page after page of what is essentially the same information. An example of when this capability is important is when an event, such as a happy hour, has many occurrences. Instead of re-ranking each occurrence of this event, we rollup all documents by a common field and re-rank/report a single event with the number of repeats.
Starting with a standard implementation of baseline Lucene/Solr functionality, Zvents has enhanced the relevancy ranking and performance attributes of the search in some unique and interesting ways to deliver on important for business requirements. Performance is at the heart of the Zvents offering, due to the very nature of discovery; latency can induce impatience, and then lose frustrated users. By minimizing both the processing time with multi-layered caching and related optimizations, and by saving the user time by de-duplication, Zvents search technology is a key factor in our site’s success. The approaches we’ve pioneered at Zvents can benefit many other installations.
Amit Nithianandan is a Search Engineer at Zvents Inc.
Ivan Small is Director of Search at Zvents Inc.
Tony Barreca is a technologist and freelance writer living in the San Francisco Bay Area.