What a way to start out a conference on using data! Stephen Dunn’s keynote for Day 1 of Lucene Revolution — the Guardian‘s opening up of its content using an API, and how Lucene/Solr was involved in that — was interesting all by itself, but he himself is also a good speaker, engaging the audience. A great way to start the day.

Stephen is from The Guardian, the second oldest newspaper in the United Kingdom. If you’ve been following along, you know that newspapers have been having a hard time of it lately, but the Guardian is in a unique (or at least very rare) position; the paper is actually owned by a trust, so short-term profits can take a back seat to long-term planning and experimentation. That gave the Guardian the freedom to experiment with their online presence.

This results in two interesting aspects of his talk. The first is technological; in 1999, when they moved online, they found that they had steeply increasing traffic. That’s good, of course, but that also meant increasing load on their database, which is not so good.

So they started doing searching through Solr, and the load on their database leveled off, even though traffic increased. So the Guardian thought, “great, what else can we move to search?” The answer turned out to be “everything”, it seems.

Basically, being able to find data using search (as opposed to using an RDBMS) enables the Guardian to go from being a publisher to being a platform. After requesting an API key, developers can basically do whatever they want with not just current Guardian data, but also past articles, curated data, and also information from a separate Politics API that records voting records and other political data.

Of course from a business standpoint, the question is, “how can this possibly be a good idea?” The answer turns out to be twofold.

First off, articles come with advertising, so even if they’re being displayed in someone else’s application, the ads are still being displayed and to a new audience, so that’s additional revenue for the Guardian. That one’s obvious.

The second reason isn’t so obvious. Basically it comes down to an acknowledgement that there are people who know more about the topics they cover than they do, and by keeping their data open, rather than locking it behind a pay wall (as some of their contemporaries have done or are doing) the Guardian makes it possible to get participation from those people. The example that he gave is the Arab Spring, where people in the region basically followed Al Jazeera and The Guardian. These people also responded to Guardian content in social media, so where their competitors might get a dozen comments, all from people in the UK, the Guardian had hundreds of tweets (or more) from people with a variety of perspectives.

The Open Platform, as they call it, also provides another way to get additional perspectives; Guardian data can be built into applications that use their Micro Apps API, which then enables those apps to be integrated back into Guardian sites and platforms (or other sites). Similarly, their API enabled them to create a WordPress plugin. That plugin then in turn enabled them to get outside subject matter experts, such as scientists, to blog for them and have other content added into the Guardian CMS.

So overall, the capabilities provided by using Lucene/Solr to search their data provides an opportunity for the Guardian to open themselves up as a platform, which in turn makes them a better publisher.

Definitely a win, and a great way to start the day.

Cross-posted with Lucene Revolution Blog. Nicholas Chase is a guest blogger.This is one of a series of presentation summaries from the conference.