An Interview with Marc Krellenstein
Open source software continues to gain momentum. In part, savvy managers recognize that commercial software can lock an organization into a walled garden. When captured, some flexibility is lost. With mounting economic pressures, open source software can reduce or hold down certain costs such as annual licensing fees, mandatory certification programs, or access to some third party software because that software is not compatible with a proprietary solution. In search and content processing, open source search systems such as Lucene have become worthy challengers to some commercial and proprietary systems. IBM, for example, uses Lucene in its search products.
Last year, Lucid Imagination entered the search wars with its Lucene-based system. Lucid’s business model is designed to support Lucene as an open source search system. A Lucid customer can obtain professional services and support from Lucid. The company’s commitment to open source makes it clear that enterprise search and content processing remains a dynamic and innovative software sector. I spoke with Marc Krellenstein, one of Lucid Imagination’s founders, on March 12, 2009. The full text of my interview with him appears below.
Would you tell me about your company and where it fits in the search and content processing landscape?
We’re a commercial open source company focused on search using open source Apache Lucene/Solr. We provide offerings similar to commercial full text search companies — high quality search software (we develop both open source and value-add software), commercial support, consulting/professional services and training. We think the most important characteristics of Lucene/Solr are that (a) it’s the best technology available for most search needs in terms of scalability/performance and search effectiveness (accuracy), and (b) it’s open source — meaning it has no license fees and includes source code, as well as being actively developed by an open source community trying to meet real end-user needs. (License fees are only one part of the cost of deploying and maintaining a search application, but zero license fees are not a bad place to start.)
We also provide a place to access knowledge and best practices about building search applications with Lucene/Solr and a forum, certification program and showcase for independent consultants building Lucene/Solr applications as well as developers working with Lucene/Solr.
What’s your background and what attracted you the search sector? What’s the history of your company? What’s the business model for your open source search system?
I’ve been working in search for about 20 years. I was the founding technologist of Northern Light Technology, an Internet search leader when it launched in 1998, and also served as VP of Search and then CTO at Elsevier, whose business relies heavily on search to provide online access to its scientific, technical and medical journals and books. I started Lucid in August, 2007 together with three key Lucene/Solr core developers – Erik Hatcher, Grant Ingersoll and Yonik Seeley – and with the advice and support of Doug Cutting, the creator of Lucene, because I thought Lucene/Solr was the best search technology I’d seen. However, it lacked a real company that could provide the commercial-grade support and other services needed to realize its potential to be the most used search software (which is what you’d expect of software that is both the best core technology and free). I also wanted to continue to innovate in search, and believed it is easier and more productive to do so if you start with a high quality, open source engine and a large, active community of developers.
We expect to be the largest contributor to open source Lucene/Solr and to support the business by selling support, services and value-add software.
Consolidation and vendors going out of business like SurfRay seem to be a feature of the search sector. How will these business conditions affect your company? Will services generate sufficient revenue to make your enterprise viable?
I think most search companies that fail do so because they don’t offer decisively better and affordable software than the competition and/or can’t provide high quality support and other services. We aim to provide both and believe we are already working with the best and most affordable software. Our revenue comes not only from services such as training but also from support contracts and from value-add software that makes deploying Lucene/Solr applications easier and makes the applications better.
Microsoft has signaled that it will include the Fast ESP system with some of its server products. Where does your company fit amidst the Microsoft, Autonomy, Endeca, and Google search horse race?
We think Lucene/Solr is better software than any of these for providing scalable, accurate, affordable, platform-independent and easy to customize/deploy/maintain search for most applications. Any of the products you name could be a better choice in some very specific situation if the application in question can exploit particular strengths of the product and live within its limitations. However, we think Lucene/Solr is the best generic technology and better-suited for most applications.
Would you give me a couple of use cases – that is, examples – of your software in action? What’s the typical cost of your firm’s software?
We believe about 4,000 organizations currently use Lucene/Solr in production. Publicly visible search sites that use Lucene/Solr include CNET, Comcast’s Fancast site, LinkedIn, Monster, MySpace, Netflix and Wikipedia. Lucene/Solr are also in use at Apple, HP, IBM, Iron Mountain and Los Alamos National Laboratories.
Lucene and Solr are free. We provide free, certified distributions of the latest stable Lucene/Solr code, including additional free components and free ‘getting started’ assistance. We also offer downloads of the Apache Lucene/Solr distributions, which can be downloaded directly from Apache as well. We sell support and other services for those users who want them.
What are the benefits of using your version of Lucene?
Our free certified distributions consist of the latest, stable Lucene/Solr releases together with other changes (e.g., performance improvements, bug fixes) contributed to Lucene/Solr that we have tested and believe are ready for widespread use. We update our certified distributions when, and only when, we believe there is a new version of Lucene or Solr we have tested and recommend, and will then provide instructions and assistance for upgrading — much as a commercial company offers regular, stable new releases of its own software. We also include certain free, value-add components in our distributions (e.g., a Solr performance monitoring tool) and provide free ‘getting started’ assistance.
Search is a confused space. A recent consultant report argues that search is stable. What’s your view of this market sector? Why?
I would agree that core search capabilities are today relatively well understood, stable and, when properly executed, fairly effective. Studies indicate that over 60% of Google searches find the information that is sought. A 60% success rate is good enough for many applications, though it still leaves a lot of room for improvement. However, I believe most search applications require and should expect a higher level of success. That’s because most applications do not work with the same data as Google – a lot of valuable Internet data contained within a drastically larger quantity of mostly irrelevant data – or with Google’s limitations with regard to understanding its data (it’s not really “its” data) or Google’s need, despite its size, to conserve resources to handle enormous amounts of data and numbers of users.
If Oracle, Microsoft, or IBM gives away search won’t that change the attractiveness of your open source solution? If no, would you explain how you see “free” commercial software leaving market space for “free” open source search?
It would mitigate one of the advantages of open source Lucene/Solr – no license fees – but not the other advantages already mentioned: better software for providing scalable and accurate search for most applications; high quality support and services from Lucid; open source code that a customer owns and that makes the customer independent of any vendor; and an active open source community.
What expertise have you assembled to give your open source business strategy clout and market pull?
Our business team has experience in open source business models and enterprise software. Our investors and board of advisors include some of the most respected open source business leaders, including Matt Asay, Mark Brewer, Mary Coleman, and Doug Levin. We work and share ideas with a number of other open source companies.
As you look forward, can you describe in general terms some of the new features that you will be introducing this year?
We will be offering new distributions of Lucene/Solr and additional value-add software components. These components will focus on tools for managing Lucene/Solr applications (e.g., performance monitoring), add-on’s for providing specific capabilities and user interface and other ‘widgets’ to make building Lucene/Solr applications easier. Building a great search application is still somewhat of an art, and we hope to distill some of our experience and expertise into tools that will enable users to build not merely good search applications but great ones.
The economic climate is affecting organizations throughout the world. Aside from the economy, what are the major trends that you see in search and content processing over the next nine to 12 months?
I think there are two trends of particular interest:
- The value of search is widely understood today, and it is nearly ubiquitous — found on nearly every web site (e-commerce sites, company or informational sites, etc.), in almost every product and inside many or most organizations. It was a real accomplishment and of real value to go from no search to a basic search capability in so many situations. However, most search applications are only okay, and often mediocre. One too often has the feeling that the information is there but you’re just not finding it, or not finding the best results. (Google’s accomplishment in so often finding a good enough or even best result is mostly due to a surplus of good enough answers and brilliantly exploiting hypertext links to find the most popular or authoritative answers – conditions that don’t apply to most other search situations.) I think enterprises increasingly view their existing search capabilities as not adequate, and realize they are losing time and money because users too often don’t find what they are looking for. We have already seen a number of customers who complain that their search is just not good enough, and expect to see many more.
- Advanced capabilities such as machine learning for classification and text mining to extract people and places (or diseases, sentiments, events, etc.) have been around for a while, but I think it’s only in the last few years that they have crossed over from bleeding edge to leading edge and can now impart real value to many applications. (Mahout, a sub-project of Lucene, has started to offer tools in this area.) I think this trend will continue and accelerate as more and more customers derive real business value from these tools.