I blogged a few days ago about how open search source is disrupting the relationship between data and data access, and mentioned the talk by Matthew Wall at The Guardian. The slides tell the story quite well.



Two points of particular interest. One: in facing the new media landscape, The Guardian realized that curating and retrieving data was not enough: they needed to “mutualise the news” by supporting the data fabric of the internet (see slide 26). Opening up Solr-based APIs to allow 3d parties to create innovative news delivery mechanisms unlocks new distribution channels for The Guardian — reaching new customers and markets.

Two: With all the talk around cloud data stores, one might think that MapReduce is really all there is to it in breaking open the limitations of structured query language.now that Google has granted The Apache  Software Foundation the patent for MapReduce, I expect innovation to really blossom on the data side. In fact, alongsid our “cousins” at Cloudera, two new Hadoop companies are emerging: Datameer and Karmasphere (rhyming consultants have been busy). Add them to MongoDB, Cassandra, CouchDB, Memcache, Hypertable and the other NoSQL variants, and there’s a rich variety of options for going beyond the relational model. But once you put all the data in there (and there’s plenty of it), what do you do with it? Map, reduce and …?

…index and search, more often than not with Lucene/Solr. It’s good to store the data unbound from the strictures of structured rows and columns; credit the many cloud databases with the “No-S” of “No-SQL”. When it comes to the Language of Queries, perhaps No-QL really means Lucene/Solr?  See what The Guardian says, starting at slide 35.