As we head towards the official GA launch of Lucidworks Big Data, I thought I would provide an update on our Big Data Beta program, as well as on the progress we’ve made on the project. 

First and foremost, all of the customers involved with the beta program have their own unique business needs. The common theme, of course, is they want an application development platform that combines search (via Lucidworks Search and Solr) with the large scale computation and storage needs of Hadoop, along with machine learning and analytical capabilities for deeper understanding of their content and their users. We call this the “Search, Discovery and Analytics Virtuous Cycle” and we think it helps those interested in Big Data focus on the problems they want to solve instead. The customers range from those doing large-scale crawling to those interested in long-term access and archiving of content. As we get closer to launch, we will be sharing more detailed use cases from the beta program.

As for the product itself, we’ve made a number of enhancements since the beta began, as well as fixed a number of bugs. Highlights include:

  1. Named Entity Recognition for English with support for indexing
  2. Large scale pairwise similarity for tasks like document de-duplication
  3. More search log analysis workflows
  4. Sub-workflow support enabling for more fine grained workflow support
  5. On premise deployment options (we originally focused on hosted)
  6. Lucidworks Search v2.1.1 support
  7. Performance and failover/fault tolerance improvements

We are also actively working on adding more real-time processing and indexing capabilities, as well as some interesting classification capabilities that should make it a lot easier for application developers to build and deploy cutting edge classification models.

For those interested in participating in the Big Data Beta program, we have a few slots available in our September cohort.  Please apply by filling out our beta application.