Box.net moves to Solr – 10X faster search, indexing 1M docs/day

Box.net, the cloud-based content-management system based here in Silicon Valley, recently flipped the switch and moved to Solr / Lucene for their document search. It’s an interesting development on a couple of fronts. Box.net has 360 million docs on line, adding about 1 million docs per day (all new documents to be indexed as they arrive).

First, as Box.net VP of Technology Sam Ghods notes in his blog post a couple of days back:

…you should immediately notice the blazing fast speed of Solr. Quick search results are available in less than half a second, and full search results don’t take much longer. Second, full-text indexing for all your newly uploaded files now happens in under 20 minutes, helping you locate documents even faster. We also switched to using the Apache Tika project for text extraction, allowing for extremely accurate fidelity in the indexing process. As time goes on expect these speeds to improve even further, as we iterate and improve on the architecture.

And most importantly, the new search platform is not only scalable in the sheer quantity of data it indexes, but also in the sophisticated features we can build on top of it. We’re excited to be developing and rolling out some more advanced search options over the next several months.

Perhaps a more significant aspect of the story is the ever broadening availability of alternatives for organizations centered on Microsoft technologies and content management strategies. Add it to our announcement earlier today of Lucidworks Enterprise release 1.8 with support for indexing Sharepoint ACL, and the breadth of available solutions is looking pretty good.

You Might Also Like

How an electronics giant meets engineers where they are, with 44 million products in catalog

Meet Mohammad Mahboob: A search platform director navigating 44 million products across...

Read More

From Search to Solutions: How AI Agents Can Power Digital Commerce in 2025

Watch this on-demand webinar to discover the six smartest AI-driven DX strategies...

Read More

Build custom AI agents without writing a single line of code? Yep, we did that.

Finally, a low-code AI platform (really, no code) that lets the people...

Read More

Quick Links