Comcast Corporation is one of the largest providers of entertainment, information and communication products and services, with over 24 million cable customers, 15 million high-speed Internet customers, and 6.5 million phone customers. Its Comcast Interactive Media (CIM) division is chartered to develop and grow the company’s Internet businesses. CIM’s Fancast.com–an extensive on-line video collection of television shows, movies, trailers and clips–gets over 10 million unique users per month. Users can browse and search across the site’s 4M+ content items to find the entertainment they want.

Requirements/Challenges

  • Meet performance target of 20ms per query at peak load, and scale to 1 million unique users per day
  • Offer simple search interface, while keeping deep customizability
  • Low fixed & operational costs
  • Deliver complete functional search features

Comcast Corporation is one of the largest providers of entertainment, information and communication products and services, with over 24 million cable customers, 15 million high-speed Internet customers, and 6.5 million phone customers. Its Comcast Interactive Media (CIM) division is chartered to develop and grow the company’s Internet businesses. CIM’s Fancast.com–an extensive on-line video collection of television shows, movies, trailers and clips–gets over 10 million unique users per month. Users can browse and search across the site’s 4M+ content items to find the entertainment they want.

Challenges

Search is critical to Fancast’s business objectives — getting users to all the media content they want, as quickly and intuitively as possible. The search implementation had to meet three key challenges:
1.     Offer a simple search interface, ideally one simple box – without sacrificing deep customizability, to constantly meet and exceed user needs without exposing them directly to content complexity
2.     Handle massive content scale – literally all TV and entertainment content – at scales responsive to mass market traffic and reach.
3.     Achieve low fixed and operational costs in terms of dedicated development and support staff, and minimal additional hardware.

Functional and Performance Requirements

Fancast uses metadata from many different 3rd party sources such as IMDB.com (the Internet Movie Database) and Tribune Media Service. Each of these 3rd party sources has its own specific format, as well as differing content refresh schedules, and none includes a comprehensive metadata store with consistent data and descriptions.  For example, the official Hollywood Spider-Man movie titles from Marvell Entertainment use two hyphenated words, but most users enter them as one word, with no hyphen.

The ability to present an authoritative index was not only essential to the user experience, but also a key differentiator for the best search experience. Users searching Jessica Simpson probably don’t want to end up with Homer Simpson.

In terms of performance, the goal was to grow from 50,000 to 1 million peak unique visitors per day over 16 months. To ensure candidate search technologies could meet this goal, CIM defined a clear scaling metric, with search query response under 20ms/query at peak load, at the same order of magnitude as for website interactions. Scaling and capacity targets were also set at the application server level so that a single physical application server could host multiple server instances, each with a similar scaling profile. This also simplified sizing requirements for the operations team for calculating how many servers would be needed for a given number of users.

Testing & Evaluation

CIM shortlisted two search alternatives: Solr, the Lucene search server; and a large well-known commercial search product. To pick the finalist, they created a test-bed with indexes of both two million and four million documents deployed on each of the Sun x64 servers running Red Hat Linux. To review the results and optimize the Solr Lucene search infrastructure, CIM hired Lucid Imagination. Consultants from the commercial vendor did the same with their solution. The CIM team benchmarked query response rates at different load levels, ranging from 100 to 1500 requests per second, as well as stress tests at failure envelope points.

The result: Solr outperformed the commercial alternative search solution, both in terms of response rates as well as failure-handling characteristics. There was no question that Solr could meet the targets set for performance.

CIM also compiled a list of 180 functional features for comparison. In addition to its superior performance, Solr also came out ahead on functions and cost of ownership to meet CIM’s business objectives.

The Choice For Solr

Solr made the final cut based on:

  • Performance and scalability advantages
  • Required search features
  • Organizational fit
  • Total Cost of Ownership
  • Active Lucene/Solr open source development community
  • Other large organizations that “bet the company” successfully on Solr (CNET, Netflix, MySpace, Orbitz)

In addition to the availability of community and commercial support, CIM benefited from the deep expertise in search offered by Lucid Imagination to configure their Solr implementation in accordance with best practices, and to optimize scalability.

“Hiring Lucid Imagination took a high potential platform that our people liked, and turned it into a reliable, high-performance platform that really satisfied our business leadership.” Ranga Muvavarirwa, Director Product Planning, Comcast Interactive Media