No doubt you’ve already seen the discussions on Google’s recent demotion of JC Penney search results in the NYTimes Sunday article “Dirty Little Secrets of Search“. JC Penney, the retail chain, got slapped for an apparent overzealous exploitation of PageRank and spam sites to hit the top of search results (note that the article is the most frequently emailed from the NYTimes business section as of this moment, which is how I found out about it). I was struck not by the unseemly smackdown between the search engine and one of its customers, so much as some of what the article told us about the state of search.
- Google processes 1 billion queries per day, according to Matt Cutts, the head of the Webspam team at Google. So does Twitter, using Lucene. A billion here, a billion there…only not all queries are created equal, of course. Processing and content notwithstanding, that adds up to 1 billion attention span units for both Google and Twitter. An interesting indicator of the relationship between social media and ‘classic’ internet.
- Just over half of the clicks on Google search results go to the first result (34%) or the second result (~17%), according to a study last May by Daniel Ruby of Chitika, an online advertising network of 100,000 sites. Your mileage may vary, but your users are picking up their search habits in a ruthless environment where if results are not in the top two, relevance/recall doesn’t matter.
Naturally, if you’re building your own search apps, you probably spend an outsize proportion of your time on relevance (or you should) — like Google, looking for new ways to anticipate the intent behind user queries. But you don’t have to be Google to tap in to what your users know.
We’ve just published a new white paper on ‘Applying Social Strategies with Lucidworks Enterprise.’ In it, we talk about the several ways to leverage social search factors. Using Lucidworks Enterprise and Solr, you can craft your search app to determine result relevance not only using traditional mechanisms such as computing intrinsic similarity of the query to the content of the document, but also what your users think about that similarity. Lucidworks Enterprise features Click Scoring, which can even track the links users choose for a particular search and adjust the relevance of documents interactively, adapting to changing conditions and user preferences.
And finally, intrinsic similarity is not all it’s cracked up to be. Here’s noted search blogger Steve Arnold:
Fact: There are no objective search results. … No relevance method delivers exactly the same results unless a human intervenes. … Web search has to generate revenue and only be “good enough.” Forget the superlatives and deal with the bias inherent in search, content processing, and indexing.