For many of us, search is actually a stand-in activity for our real goal when asking a search engine a question: to find an actual answer to a problem we are having. After all, when you ask a search engine “who was the 23rd President of the United States?” wouldn’t you rather get the real answer (Benjamin Harrison) than 10 blue links which might contain the answer? (Please note, most Internet search engines do have basic fact-based QA– question answering– hooked in these days) Of course, this is no small feat for arbitrary questions, even with advances like IBM Watson’s ability to play Jeopardy! It is, however, often possible to get good “answers” out of a system when we make a few assumptions, like my fellow authors Drew Farris, Tom Morton and I have in “Taming Text”, our recently released book aimed at helping software engineers understand the concepts behind search and natural language processing while using real, working examples leveraging open source projects like Solr, Mahout, OpenNLP and more. For instance, in our book, we build a rather simple QA system that uses OpenNLP during indexing to mark up content, we use Solr for passage retrieval and we use a simple sliding window scoring algorithm that allows the system to identify small passages that likely contain the answer to fact-based questions that are entered using natural language (English in this case.) Don’t get me wrong, I’m not claiming that the code in Taming Text is production ready by any stretch or nearly as sophisticated as it could be, but it is similar in concept to real QA systems the authors have built in the past.
Now, lest you think this is a completely shameless plug for the book (half shameless?), please note that all of the code is freely available as is chapter 8 of the book, which explains the process for actually finding the answers using Solr and OpenNLP. Rather than rehash all of what is involved, I hope you will check out the code and the free chapter. Naturally, if you find them useful, I hope you’ll consider supporting the book!