Reading Metadata Between the Lines: Searching for Stories, People, Places and More in Television News

As we countdown to the annual Lucene/Solr Revolution conference in Austin this October, we’re highlighting talks and sessions from past conferences. Today, we’re highlighting Kai Chan’s experiments at UC with media metadata search.

UCLA’s NewsScape has over 200,000 hours of television news from the United States and Europe. In the last two years, the project has generated a large set of “metadata”: story segment boundaries, story types and topics, name entities, on-screen text, image labels, etc. Including them in searches opens new opportunities for research, understanding, and visualization, and helps answer questions such as “Who were interviewed on which shows about the Ukraine crisis in May 2014” and “What text or image is shown on the screen as a story is being reported”. However, metadata search poses significant challenges, because the search engine needs to consider not only the content, but also its position and time relative to other metadata instances, whether search terms are found in the same or different metadata instances, etc. We will describe how we have implemented metadata search with Lucene/Solr’s block join and custom query types, as well as the collection’s position-time data. We will describe our work on using time as the distance unit for proximity search and filtering search results by metadata boundaries. We will also describe our metadata-aware, multi-field implementation of auto-suggest.

Kai Chan is the lead programmer for the NewsScape project at the University of California, Los Angeles. He has extensive experience programming with Lucene, Solr, Java, PHP, and MySQL and has been especially involved with the development and programming of video and text search engines for the archive. Other projects that he has worked on are ClassWeb, Moodle, and Video Annotation Tool. He has given numerous presentations regarding his work to faculty and researchers at the university, as well as Lucene and Solr tutorials to the public. Kai earned his B.S. and M.S. degrees in Computer Science from UCLA.

http://www.slideshare.net/lucidworks/reading-metadata-between-the-lines-searching-for-stories-people-places-and-more-presented-by-kai-chan-ucla

lucenerevolution-avatarJoin us at Lucene/Solr Revolution 2015, the biggest open source conference dedicated to Apache Lucene/Solr on October 13-16, 2015 in Austin, Texas. Come meet and network with the thought leaders building and deploying Lucene/Solr open source search technology. Full details and registration…

Share the knowledge

You Might Also Like

Lucidworks Named a Leader: What This Means for Search, AI—and Your Business

Lucidworks’ recognition as a Leader signals that enterprise search and AI now...

Read More

MCP vs. ACP: What’s the Difference, and When Should Each Be Used?

Artificial intelligence is changing how people interact with data, products, and content....

Read More

The Future of Digital Commerce with ACP: From Static Catalogs to Agent Negotiations

For decades, digital commerce has been built around a familiar concept: the...

Read More

Quick Links