by Tony Barreca
When ilocal and JTeam connected, it was an ideal fit for a lot of reasons. At the time, ilocal had been using an Enterprise Search (ES) solution based on a commercial ES product, but had hit a wall on multiple dimensions.
As leading on-line directory service site headquartered in the Netherlands, ilocal had stringent requirements for innovative functionality implemented quickly. They discovered that their commercial provider was often unable to respond with product updates with sufficient speed.
In addition, like many other commercial search providers, they “specialized in directories.” In other words, their search technology was optimized for Web-centric searches, rather than accommodating the additional complexities of wide-ranging enterprise datasets.
Confronted with these limitations, ilocal decided that the time had come to take the situation more in their own hands. To control their business destiny effectively, they first had to exert control over their core technology. They started to consider alternatives, including an open-source solution based on Lucene/Solr. At this point they engaged with JTeam. JTeam is a Dutch-based company specializing in open source, customized software solutions using JEE and related technologies. For ilocal, JTeam architected the new solution and developed it jointly with ilocal developers, providing targeted training, knowledge and skills transfer along the way. The goal was to deliver both a new search platform and an ilocal team fully empowered to maintain and extend it. Mission accomplished.
We spoke with JTeam’s Enterprise Search lead, Uri Boness, to ask what about the work they did was challenging, interesting, and cool.
ilocal is in an exceptionally fast-moving marketplace, with searchable datasets in constant and rapid flux. They were fairly clear on their business requirements from the outset of the engagement, and this in combination with the flexible, iterative Agile approach the development team used, turned out to be an important factor in its eventual success. The new search solution set needed to exhibit:
- Results ranking that combined complete flexibility with exquisite precision
- A scalable solution with top-shelf performance (low latency for users)
- Support for location-based searches (i.e., for geo-tagged data)
These became the basis for defining a set of success metrics, discussed further below. In addition, the new solution needed to be implemented and deployed quickly (it was ready in 4 months) and transparently, i.e., without any visible effect on existing customers and operations.
After an initial discovery stage, the two organizations formed a development team and worked in tandem during short sprints to deliver the required solution, with JTeam acting as the lead developer. The jointly developed solution was built on top of Solr V1.3, which itself turns out to be illuminating given some subsequent developments.
Solutions using Solr
This joint effort resulted in some innovative technology that should interest anyone working on Enterprise Search, not limited to the local search business. The centerpiece of the technology resulting from the ilocal/JTeam collaboration is an approach to ranking that is dynamically re-configurable on a per-query basis. It also turned out that the technologies required to fulfill the requirements for scalability, performance, and location-based search support are loosely coupled in their implementation.
Dynamically re-configurable, “per query” ranking
In the ilocal environment, a wide range of variables are key to getting optimal results. These moving targets require the ability to change configurations (weightings) quickly and on-the-fly. Although the basic ranking capabilities of Lucene and Solr are industrial-strength, they are also pretty generic and were not by themselves up to the more demanding requirements of ilocal’s business.
It soon became clear that to accommodate the full range of user requests ilocal encounters, the ranking scheme needed to be both context-aware and sufficiently rich to support a wide variety of possible contexts, i.e., some with and some without geo-data, some using “categories” and others not, etc.
To accomplish this, the development team introduced a new object type on top of the base Solr functionality, namely, the
SearchContext. Search contexts can include whatever parameters are needed; common ones include location, language used (e.g., English or Dutch), and time of day.
In addition to this
SearchContext object type, the development team also extended some core Solr functionality to make it all work. They used the Solr standard
DisMaxQParserPlugin, but augmented it with custom Query functions. They also implemented a custom
ilocalRequestHandler, based on Solr’s
StandardRequestHandler, but extended with the ability to resolve the appropriate search context for each request (query).
Now, when a user request hits an ilocal server, its ranking schema (weighting configuration) is computed and assigned dynamically, based on its
Multi-core and location-based-search support
ilocal had some stringent performance and scalability requirements in mind from the beginning of the project, and achieving them ultimately required multi-core and location-based-search functionality to be implemented in tandem on top of Solr V1.3. For those unfamiliar with Solr terminology, multi-core in this context refers to the simultaneous utilization of multiple, logically independent Solr indices, with each index having its own configuration. As it turned out, this multi-core capability proved to be essential to the performance of location-based search.
Since location-based search is ilocal’s principal business (a typical query involves for example, finding a plumber in the city of Utrecht), almost every request they receive is within a location-based Search context. Unfortunately, some early testing of the standard location-oriented Lucene and Solr libraries indicated that their performance would not achieve ilocal’s goals. However, the relevant Solr plugin,
localsolr, proved adequate as a starting point from which to innovate, with the development team optimizing for performance every step of the way. Although the basic logic of this plugin stayed the same, the implementation was significantly modified and extended.
In the end, performance and scalability objectives were met by:
- Parallelizing computation by using multiple threads
- Optimizing distance computations in terms of ilocal-specific requirements
- Utilizing multi-core functionality
As an example of optimizing distance computations for ilocal, the team discovered that accuracy sometimes is not worth the cost of achieving it. They determined that treating the Earth’s surface as a Cartesian plane, a mathematical strategy for achieving maximum accuracy in the calculation of point-to-point distances, results in an approximately 0.2% increase in accuracy. This increase was not a worth hit to performance, noticeable to the user as latency, and so they eliminated this computation from the algorithm.
Similarly, the multi-core approach that the team implemented was initially motivated by ilocal’s specific approach to search. As can be seen on the ilocal home page, ilocal supports search for companies, products, and services either by name or by category in one text-entry field, or by geographical proximity in another.
To make this work, two cores were implemented: one holding the main index (products, companies, services), and the other holding a smaller index of locations.
The structure of the location index, and attendant request parsing, is made complex by the need to support a relatively sophisticated algorithm for recognizing user-entered locations. For example, if a user enters “Amsterdam,” the algorithm must be able to discriminate between whether she is referring to the city of Amsterdam, or to a street named “Amsterdamsestraat” in the city where she lives.
Having the location index in a separate core helped in several ways. The complexity of the location index did not affect on the complexity of the main index; each index has its own unique and customized structure. And since the indices are separate, it is possible to manage them separately as well (for example, rebuilding the locations index doesn’t affect the main index and vice-versa).
As mentioned above, this collaboration between ilocal and JTeam was based on Solr V1.3. In a great illustration of how well and how quickly the development community steps up to real-world needs, most of the customized clustering support that was built for ilocal will come out-of-the-box in the upcoming version – Solr 1.4.
It’s fair to call the ilocal/JTeam collaboration an unusually successful one. All the business and technical objectives were met:
- Average search time dropped by 70 percent with even the most complex queries not exceeding response times of 500 ms.
- User response time (latency) remains relatively constant even as the number of users on the system is increased
- Budget and time constraints were respected and the project came in both under time and budget. The new solution costs far less on an annual basis.
- ilocal is in control of its own technology and pace of business innovation, and attained the performance and scalability it needs to take its business to the next level.
Tony Barreca is a technologist and freelance writer living in the San Francisco Bay Area.