Back in the 1990’s, Carnegie Mellon University developed the Capability Maturity Model, a scale for determining how prepared a contractor’s processes were for a particular task. If you’ve ever written software for anyone but yourself, you’ll recognize some of these definitions, which call to mind the famous characterization of the evolution of software.

Sensis, “the search engine for Australians”, uses a modified version of this model to assess their own search processes. It has five levels:

  1. Unmanaged: Set it and forget it, basically.
  2. Ad Hoc: People work on the process part time, and innovation is led by individuals with an itch to scratch.
  3. Monitored: There’s a defined team responsible for improvements, and it’s monitored for problems.
  4. Managed: Improvements are methodically sought, and there are defined targets and metrics.
  5. Optimized: This is obviously the target level, where machine learning leads to the best possible results.

On day 1 of Lucene Revolution, Craig Rees talked about Sensis‘ goal of moving up that maturity ladder as they made their data — millions of white pages and yellow pages listings — available via a developer API. When it comes to search, Sensis is currently at the “Montitored” stage, and making the move up to “Managed”.

Slides for this session:

It turns out that as much as you might like to, you can’t “jump levels” on the maturity index; you’ve got to earn it the hard way, and that’s what Sensis is doing.

Grant Ingersoll is always saying that if you’re not methodically testing your search results, you’re not testing at all, and Sensis definitely is a good example of methodically testing your results. To formulate a “scientific approach”, they created “Gold Sets” of queries and results, which enables them to tweak their Lucene settings, then compare the results with their gold set of “perfect” results.

Of course, Sensis has a lot to test for. Context is key, Craig points out; a 12 year old on his mobile phone in the schoolyard at noon probably shouldn’t get the same results as a 60 year old at home on his computer at 10pm. And context has a lot of variables, such as time of day (or even time of year), location, device, or, and this is perhaps hardest to quantify, intent.

Sensis also has other hurdles to leap. For example, while conventional wisdom says that “rare” terms should be worth more, it can sometimes backfire in their data set, which is broad but not deep. The term “flowers” is rare in “crematorium” listings, but if you search for “flowers”, a crematorium is probably NOT what you want. They also have to deal with contextual synonyms. For example, “bow” and “ribbon” are synonyms — unless you’re also looking for arrows.

So Sensis created a tool, the Search Quality Analysis and Testing system, or SQUAT. (And yes, the audience snickered when he said, “All we need is SQUAT,” but at the same time, I think most of us were just a little bit jealous). The very first question was, “Are you releasing that as open source?” The answer was, “We’re thinking about it, but not at this time.” SQUAT enables Sensis to test for a variety of terms and know exactly how good the results were. If a change doesn’t result in a positive effect on quality score, it doesn’t go into production.

And isn’t that what we all want?

Cross-posted with Lucene Revolution Blog. Nicholas Chase is a guest blogger.This is one of a series of presentation summaries from the conference.