Back from Berlin Buzzwords and finally over the jet lag, so I thought I would put up some feedback. First off, it was a well organized conference with a nice focus on searching, storage and scaling. Kudos to Isabel, Simon and Jan for all their hard work. It also had great wi-fi coverage, which is always a struggle at every conference I’ve ever been too.
As for the talks, I gave the Keynote on using open source tools like Apache Solr and Mahout to deliver intelligent applications (slides — really should be a PPT so you can see the animations) on Monday first thing in the morning and I felt it went pretty well, but I’ll let others be the judge (videos should be online soon). The rest of the day, I spent going in and out of the various tracks. The Lucene track was very well done, with good talks by: Uwe Schindler and Simon Willnauer on the State of Lucene, Robert Muir on Finite State queries in Lucene; Michael Busch on Real Time Search at Twitter, Jukka Zitting on Tika and Andrzej Bialecki on Nutch. See Berlinbuzzwords: Links To Slides for all the slides (not all are available just yet).
I also went to a variety of the Hadoop and NoSQL talks. Lots of people in the NoSQL talks making pitches on why their approach is best, which is very helpful in determining what tool to use at the appropriate time. I still, however, can’t shake the feeling that one could take the new Solr Cloud stuff, a dead simple schema (id and one or two simple fields), and have a large scale distributed key-value storage that overcomes almost all of the limitations of many of the NoSQL technologies (ad-hoc queries, range queries, search within the values, extendability) with minimal overhead of indexing (which can be greatly reduced by using either literals or very simple analysis). Not only that, Lucene/Solr already is “document-centric” and I’ve seen it scale to billions of documents with high availability and high QPS and that was using “real” documents (i.e. articles, etc.), not simple key-value pairs, so I can’t help but feel like simple key-value pairs would be even faster and more scalable. In other words, Lucene isn’t just for text search. Naturally, this is just a thought at this point, I haven’t tried testing it just yet. Also, once the new real time stuff is in Lucene, I think it will be even faster.
At any rate, the best thing about the conference was the fact that it shows the eagerness for new solutions to large scale solutions that cost less money than the sturdy old database.
Again, congrats to Isabel and team for a well executed conference in a great city and at a great venue. If you are interested in more on the Lucene portion of the conference, make sure you come visit us in Boston for Lucene Revolution!