Highlights

Lucid Imagination helped a large not-for-profit research institute streamline and standardize their Solr search implementation with:

  • On-site benchmark and analysis of Solr search framework and custom code modules
  • Best practices recommendations on scalable architecture and overall efficiencies
  • Guidance and assistance on setting up a replicated search environment

Customer Overview

A large not-for-profit research institute’s online collection offers “one-stop” searching of 2 million records, including nearly a quarter of a million media files (images, media files, online journals, and other resources) distributed across dozens of archives, databases, museums, and libraries. Both institutional and public researchers use the online search service.

Business Issue

The research institute had developed a broad set of customized search capabilities using Solr, enabling users to search across a very diverse set of information. Recently, the IT staff noticed that while most of the searches were very fast, some were very slow. Over time, the complicated taxonomy they had built had become difficult to maintain; as new IT employees came on board, they faced a steeper learning curve for handling the customized code that had been developed. In addition, they needed to deploy a replicated environment to provide business continuity. Finally, some of the functions they had customized were now available out of the box from Solr.

To ensure they were making best use of resources, the institute asked Lucid Imagination to review their Solr implementation and make best practices recommendations to ensure the implementation was consistent with best practices in the industry and in the marketplace.

Solution Description

The research institute engaged the following services from Lucid Imagination:

  • Search Health Check
  • Training
  • ExpertLink consulting services

Lucid Imagination consultants did an in-depth analysis, reviewing Solr caches, identifying better ways to set up configuration files, and removing inefficient unused code. The consultants reviewed the institute’s custom code modules to identify opportunities for efficiency gains, specifically those functions that can now be handled by native Solr, with a view to configuring for best practices. The work included reviewing over a dozen customized Java modules, and making suggestions for more efficient database access. Recommendations were also made concerning the Solr server caching environment, including the placement of an external HTTP caching server to handle common queries and reduce the load on the Solr server. Most importantly, with the transition to standardized Solr capabilities, the institute’s search application is now set up for smoother adoption of improvements in future releases of Solr.

Lucid also provided training to internal IT staff. to bring them up to speed on the latest Solr capabilities, and establish better self-reliance for ongoing search application development and refinement. The training also included a focus on knowledge transfer, so that changes made during consulting on analysis and reconfiguration were well understood by the team at-large, and they could be take ownership of them and maintain them effectively going forward.

Lucid was also able to advise the institute on hierarchical faceting—using taxonomic information to create more detailed search results that are presented in a tree structure. Lucid also introduced the institute to search opportunities with geo-searching. While this capability is not yet in core Solr, Lucid was able to advise them on related projects and how best to make use of them.

Outcomes

All custom code developed by the institute’s IT staff was thoroughly reviewed and unused modules were removed to improve efficiency and performance. A comprehensive recommendation was made for a framework to improve scalability. Because there had been no backup of the search index, Lucid provided the institute with strategy for how to set up a redundant environment with a replication server, and helped coach the implementation—for example, providing recommendations on update intervals to balance performance needs and hardware requirements.

Additional guidance was provided on a more efficient way to index new records as they come in, writing Java code that will transform the data into XML, associate descriptive tags with it, and then put it into the search database repository.