Reading Metadata Between the Lines: Searching for Stories, People, Places and More in Television News
How to implement metadata search with Lucene/Solr’s block join and custom query types, as well as the collection’s position-time data.
As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Kai Chan’s experiments at UC with media metadata search.
UCLA’s NewsScape has over 200,000 hours of television news from the United States and Europe. In the last two years, the project has generated a large set of “metadata”: story segment boundaries, story types and topics, name entities, on-screen text, image labels, etc. Including them in searches opens new opportunities for research, understanding, and visualization, and helps answer questions such as “Who were interviewed on which shows about the Ukraine crisis in May 2014” and “What text or image is shown on the screen as a story is being reported”. However, metadata search poses significant challenges, because the search engine needs to consider not only the content, but also its position and time relative to other metadata instances, whether search terms are found in the same or different metadata instances, etc. We will describe how we have implemented metadata search with Lucene/Solr’s block join and custom query types, as well as the collection’s position-time data. We will describe our work on using time as the distance unit for proximity search and filtering search results by metadata boundaries. We will also describe our metadata-aware, multi-field implementation of auto-suggest.
Kai Chan is the lead programmer for the NewsScape project at the University of California, Los Angeles. He has extensive experience programming with Lucene, Solr, Java, PHP, and MySQL and has been especially involved with the development and programming of video and text search engines for the archive. Other projects that he has worked on are ClassWeb, Moodle, and Video Annotation Tool. He has given numerous presentations regarding his work to faculty and researchers at the university, as well as Lucene and Solr tutorials to the public. Kai earned his B.S. and M.S. degrees in Computer Science from UCLA.
Join us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.