Box.net, the cloud-based content-management system based here in Silicon Valley, recently flipped the switch and moved to Solr / Lucene for their document search. It’s an interesting development on a couple of fronts. Box.net has 360 million docs on line, adding about 1 million docs per day (all new documents to be indexed as they arrive).
First, as Box.net VP of Technology Sam Ghods notes in his blog post a couple of days back:
…you should immediately notice the blazing fast speed of Solr. Quick search results are available in less than half a second, and full search results don’t take much longer. Second, full-text indexing for all your newly uploaded files now happens in under 20 minutes, helping you locate documents even faster. We also switched to using the Apache Tika project for text extraction, allowing for extremely accurate fidelity in the indexing process. As time goes on expect these speeds to improve even further, as we iterate and improve on the architecture.
And most importantly, the new search platform is not only scalable in the sheer quantity of data it indexes, but also in the sophisticated features we can build on top of it. We’re excited to be developing and rolling out some more advanced search options over the next several months.
Perhaps a more significant aspect of the story is the ever broadening availability of alternatives for organizations centered on Microsoft technologies and content management strategies. Add it to our announcement earlier today of Lucidworks Enterprise release 1.8 with support for indexing Sharepoint ACL, and the breadth of available solutions is looking pretty good.