Yesterday, we looked at the recent surge of growth in the enterprise search market. A big driver of this growth: big data. As we said, both the volume and complexity of big data is creating opportunities and challenges for organizations everywhere. But do not assume that these opportunities and challenges are the exclusive worries of data jockeys inside and outside the enterprise. As four-time CMO Lisa Arthur observed in Forbes this week, a new study by the CMO Council finds that “both CMOs and CIOs believe big data is a key competitive differentiator and will be core to implementing a more customer-centric business culture.” In a press release announcing the study, the CMO Council noted how CMOs and CIOs are often seen to be in opposition to one another, but have now found “common ground in big data.”
The study is newsworthy, for at least two reasons. First, it’s the latest in a line of studies demonstrating the shifting dynamics between marketing and IT. In a 2012 article in Forbes, Chris Perry, a global communications consultant, asked the question, “Are CMOs the New CIOs?,” citing a Gartner study that “predicts in the next five years, the average chief marketing officer will spend more on IT than his/her company’s CIO.” The new mandate for the CMO, which increasingly is tasked to manage the entire user experience, is look at IT strategically.
But just as important, the new mandate for the enterprise is to look at data strategically, as the foundation no less for a more competitive, customer-centric approach to business. As the CMO Council notes, “big data has emerged as the critical factor to achieving an enterprise-wide customer-centric culture, according to 40 percent of marketers and 51 percent of IT respondents.”
At this year’s Revolution, we’ll be looking at big data both high and low – from its strategic implications to its tactical implementation. Here’s a look at the sessions on Big Data on Day 2.
Scaling up Solr 4.1 to Power Big Search in Social Media Analytics
Presented by Timothy Potter, Architect, Big Data Analytics, Dachis Group
My presentation focuses on how we implemented Solr 4.1 to be the cornerstone of our social marketing analytics platform. Our platform analyzes relationships, behaviors, and conversations between 30,000 brands and 100M social accounts every 15 minutes. Combined with our Hadoop cluster, we have achieved throughput rates greater than 8,000 documents per second. Our index currently contains more than 500,000,000 documents and is growing by 3 to 4 million documents per day.
The presentation will include details about:
- Designing a Solr Cloud cluster for scalability and high-availability using sharding and replication with Zookeeper
- Operations concerns like how to handle a failed node and monitoring
- How we deal with indexing big data from Pig/Hadoop as an example of using the CloudSolrServer in SolrJ and managing searchers for high indexing throughput
- Example uses of key features like real-time gets, atomic updates, custom hashing, and distributed facets. Attendees will come away from this presentation with a real-world use case that proves Solr 4.1 is scalable, stable, and is production ready. (note: we are in production on 18 nodes in EC2 with a recent nightly build off the branch_4x).
Crowd-sourced intelligence built into Search over Hadoop
Search has quickly evolved from being an extension of the data warehouse to being run as a real time decision processing system. Search is increasingly being used to gather intelligence on multi-structured data leveraging distributed platforms such as Hadoop in the background. This session will provide details on how search engines can be abused to use not text, but mathematically derived tokens to build models that implement reflected intelligence. In such a system, intelligent or trend-setting behavior of some users is reflected back at other users. More importantly, the mathematics of evaluating these models can be hidden in a conventional search engine like SolR, making the system easy to build and deploy. The session will describe how to integrate Apache Solr/Lucene with Hadoop. Then we will show how crowd-sourced search behavior can be looped back into analysis and how constantly self-correcting models can be created and deployed. Finally, we will show how these models can respond with intelligent behavior in realtime.
Edanz Journal Selector: Case Study: a Prototype based on Solr/Nutch/Hadoop
Presented by Liang Shen, Developer, European Bioinformatics Institute
I’m going to introduce a project I built in 2011: Edanz Journal Selector. It’s a tool for scholars to find the right journals to publish their manuscripts. It will be a typical “How We Did It” Development Case Study.
We built Edanz Journal Selector based on Solr/Lucene/Hadoop/Hive and deployed it on Amazon web services. I’m going to share experiences about architecture, cloud and etc. from this project.