For those of you who are not regular readers of Steve Arnold’s blog, you may have missed this item about Solr/Lucene running CollectionsSearchCenter, The Smithsonian Instituion’s new online portal, easy with one stop searching of more than 4.6 million of the Smithsonian’s museum, archives, library and research holdings and collections, including nearly a half-million images. There’s a nice writeup from the Smithsonian, at the ResourceShelf blog:
In implementing this new Collections Search Center, the Smithsonian reviewed a number of commercial and open-source products. The functional requirements included the support of faceted metadata searching, Boolean / simple search logic, synonym/stemming matching, proximity matching, customizable relevance ranking, and highlighting display capability. This system will need to support a wide range of documents and objects from libraries, archives, and museums (LAMs). In the end, the Smithsonian selected the open-source Lucene/Solr indexing software for the project.
The Lucene/Solr search engine has offered the Smithsonian a flexible and scalable indexing environment to support the fast growing online collections served in the new search center. The Smithsonian has refined and enhanced the online display by programming in a Java environment. MARC records and SQL data from more then 40 data sources and databases were extracted and mapped to a common data format and ingested into the Lucene/Solr index.
In the age of all things smartphone (reading this on your iPhone?), there’s a mobile portal, too, so you can “hold history in the palm of your hand” (pardon the melodrama) with that same URL: http://collections.si.edu/
Like many government and research institutions, the Smithsonian has broad, deep needs for indexing and exposing a very broad range of data through search. In this case, it’s intended to serve curating history; in other government agencies, there are of course many other use cases. In any event, the leveraging of open source in the Federal Government is well under way.
And, if you didn’t know already, you can come hear the team from the Smithsonian talk about their Solr implementation, (which we at Lucid Imagination were proud be a part of), at the forthcoming Lucene Revolution conference in Boston, October 7-8. (Early Bird registration is open until September 10th).