We often get asked, “What can you tell us about moving from [INSERT COMMERCIAL SEARCH SOFTWARE PROVIDER HERE] to Solr/Lucene?” We’ve talked about it in a few different places in different ways (one here and another here, and more here). There are, of course, some good things to take into account no matter what the use case or environment, but in many ways, search application development is more like zoology than physics: there are an infinite variety of species, and close observation and analysis of the naturally diverse world rewards the effort. An interesting post by David Buchmann  of the European Agile development shop, Liip,  about a project in which they transitioned from the Google Search Appliance to Zend_Lucene. Excerpts worth noting:

… we have binary documents like PDF, Word and so on. There was no way to set the meta information for those documents. requiredfields=gsahintview:group1|-gsahintview should trigger a filter to say either we have the meta information with a specific value, or no meta at all. However, Google confirmed that, this combination of filter expressions is not possible. … Support by Google was a very positive aspect. They answered fast and without fuss, and have been motivated to help. They seemed competent – so I guess when they did not propose alternatives but simply said there is no such feature, there really was no alternative for our feature requests.

Sigh. Closed source is heavy when it’s sheathed in metal. Another observation:

Zend_Lucene worked out quite well for us, although today, I would probably use Apache Solr to save some work, especially reading documents and for stemming.

I am continually struck by cases in which developers opt for Lucene, but then come back to Solr. Of course, we think Lucidworks Enterprise can simplify this decision even further, since it offers a way to build better Solr apps faster.

Read the full post here. YMMV, of course, but I think you’ll find the insights useful.