Maybe you’ve heard pundits say that in the next year, humans will create more data than in all of human history. The problem with those predictions, Stephen O’Grady of Redmonk said in his keynote to Day 2 of Lucene Revolution, is that they’re true.
Here are the slides for this session:
Ultimately, he says, that is the reason we have gone from “The answer is a relational database. Now what is the question?” to what he calls “The Cambrian Non-relational Explosion”.
Just as the Cambrian explosion was a period in which many species appeared in the fossil record in a relatively short time, we’re now seeing a large number of so-called NoSQL databases such as CouchDB, MongoDB, and Cassandra appearing on the scene. O’Grady doesn’t particularly like grouping them all together under the “NoSQL” moniker, both because some of them are now adding SQL back in — hence the change to “Not Only SQL” — but also because they are so diverse. The reason that they’re so diverse is because each of them has been created to solve a particular problem. Some are document stores, some are key-value stores, some are graph databases, and so on.
So you might get the notion that they’re all necessary because relational databases can’t handle what they’re doing. But really, O’Grady says, that’s not true; gifted developers have bent the will of relational databases to do just about anything those specialized databases can do.
Up to a point.
That’s because relational databases make you do two things: define your schema, and load your data. Once you get into “big data”, which is often unstructured, either or both of those may be impractical.
The advantage to search (rather than RDBMS) is that you don’t necessarily have to load your data in order to process it. You can index documents in place, basically, making them useful in their base form.
But the big question really is, “what is useful?” Sure, you can do a search and get back an answer, but is it the “right” answer? O’Grady’s opinion is that the most important answer you can give is the next question. That’s because you’re rarely asking a single question; instead, you’re exploring a topic.
In many ways, that’s what we’re now seeing with big data, he says. From AT&T and CNet building their site navigation off of search results on one end of the spectrum to companies and organizations doing complex analysis to create entirely new information, such as the InfoChimps’ TrustRank, on the other end, we are just beginning to see a new wave of uses for data, and just as importantly, new ways to make it meaningful. For example, later today Anne Veling will demonstrate an application he and his team built for the New York Times, showing a heat map representation of search results over more than 160 years of issues.
And just as RDBMS provided a world of new opportunities for using information in ways we hadn’t considered, search and its related technologies are opening up new worlds and bringing up new questions.
And those 30 or 40 NoSQL projects? O’Grady predicts that only 5-6 of them (including Lucene/Solr) will survive in the long run.
But it will sure be interesting to see what new things we find to do with them.
Cross-posted with Lucene Revolution Blog. Nicholas Chase is a guest blogger.This is one of a series of presentation summaries from the conference.