We’re happy to announce another new addition to the Lucidworks team! Trey Grainger has joined as Lucidworks SVP of Engineering where he’ll be heading up our engineering efforts for both open source Apache Lucene/Solr and our Lucidworks Fusion platform, and our other product offerings.
Trey most recently served as the Director of Engineering on the Search & Recommendations team at CareerBuilder, where he built out a team of several dozen software engineers and data scientists to deliver a robust semantic search, data analytics, and recommendation engine platform. This platform contained well over a billion documents and powered over 100 million searches per day across a large combination of consumer-facing websites and B2B Software as a Service products.
Trey is also the co-author of Solr in Action, the comprehensive example-driven guide to Apache Solr (his co-author was Tim Potter, another Lucidworks engineer).
Trey received his MBA in Management of Technology from Georgia Tech, studied Computer Science, Business, and Philosophy at Furman University, and has also completed Masters-level work in Information Retrieval and Web Search from Stanford University.
We sat down with Trey to learn more about his passion for search:
When did you first get started working with Apache Lucene?
In 2008, I was the lead engineer for CareerBuilder’s newly-formed search team and was tasked with looking for potential options to replace the company’s existing usage of Microsoft’s FAST search engine. Apache Lucene was a mature option at that point, and Apache Solr was rapidly maturing to the point where it could support nearly all of the necessary functionality that we needed. After some proof of concept work, we decided to migrate to Solr, which enabled us to leverage and extend the best Lucene had to offer, while providing a highly reliable out-of-the-box search server which supported distributed search (scale out with shards, scale up with replicas) and an extensively pluggable architecture and set of configuration options. We started migrating to Solr in 2009 and completed the migration in 2010, by which time the Lucene and Solr projects had actually merged their code bases into one project. Ever since then, I’ve had the tremendous opportunity to help develop, speak about, write about, and run teams pushing forward the tremendous capabilities available in the Lucene/Solr ecosystem.
How has search evolved over the past couple years? Where do you think it’ll be in the next 10?
Over the last decade, the keyword search box has really evolved to become the de facto user interface for exploring data and for navigating most websites and applications. Companies used to pay millions of dollars to license search technology that did little more than basic text search, highlighting, and faceting. As Lucene/Solr came on the scene and commoditized those capabilities, search engineers were able to fully embrace the big data era and focus on building out scalable infrastructure to run their open-source-based search systems. With the rise of cloud computing and virtual machines, Solr likewise developed to scale elastically with automatic sharding, replication, routing, and failover in such a way that most of the hard infrastructure work has now also become commoditized. Lucene/Solr have also become near-real-time systems, enabling an impressive suite of real-time analytics and matching capabilities.
With all of these changes, I’ve seen the value proposition for search shift significantly from “providing a keyword box”, to “scalable navigation through big data”, and another massive shift is now underway. Today, more companies than ever are viewing search not just as infrastructure to enable access to data, but instead as the killer application needed to provide insights and highly-relevant answers to help their customers and move their businesses forward.
I thus anticipate seeing an ever growing focus on domain-driven relevance over the coming years. We’re already seeing industry-leading companies develop sophisticated semantic search capabilities that drive tremendous customer value, and I see the next decade being one where such intelligent capabilities are brought to the masses.
What do you find most exciting in the current search technology landscape?
The current frontier of search relevancy (per my answer to the last question) is what most excites me right now in the search technology landscape. Now that core text search, scaling, and cluster management have become much more commoditized, we’re beginning to see increased focus on relevancy as a key competitive differentiator across many search applications. Doing relevancy well includes adding capabilities like query intent inference, entity extraction, disambiguation, semantic and conceptual search, automatic classification and extraction of knowledge from documents, machine-learned ranking, using clickstream feedback for boosting and collaborative filtering, per-user personalization and recommendations, and evolving search to be able to able to provide answers instead of just lists of documents as a response to natural language questions. Many of these capabilities require external systems to support sophisticated workflows and feedback loops (such as those already built into Lucidworks Fusion through the combination pipelines with Solr + Spark), and Lucidworks is at the forefront of pushing this next generation of intelligent search applications.
Where are the biggest challenges in the search space?
Some of the most fun challenges I’ve tackled in my career have been building systems for inferring query intent, recommendation systems, personalized search, and machine-learned relevancy models. There’s one key thing I learned about search along the way: nothing is easy at scale or in the tail. It took me years of building out scalable search infrastructure (with mostly manual relevancy tuning) before I had sufficient time to really tackle the long tail of relevancy problems using machine learning to solve them in an optimal way.
What’s particularly unique about the search space is that it requires deep expertise across numerous domains to do really well. For example, the skillsets needed to build and maintain scalable infrastructure include topics like distributed systems, data structures, performance and concurrency optimization, hardware utilization, and network communication. The skills needed to tackle relevancy include topics like domain expertise, feature engineering, machine learning, ontologies, user testing, and natural language processing. It’s rare to find people with all of these skillsets, but to really solve hard search problems well at scale and in the tail, all of these topics are important to consider.
What attracted you to Lucidworks?
Interesting problems and a shared vision for what’s possible. What attracted me to Lucidworks is opportunity to work with visionaries in the search space building search technology that will help the masses derive intelligence from their data both at scale and in the tail. Search is a really hard problem, and I’m excited to be in great company trying to solve that problem well.
What will you be working on at Lucidworks?
As SVP of Engineering, I’ll be heading up our engineering efforts around both open source Lucene/Solr, as well as Lucidworks Fusion and our other exciting product offerings. With Lucidworks employing a large percentage of Lucene/Solr committers, we take good stewardship of the open source project very seriously, and I’m excited to be able to work more on the strategic direction of our open source contributions. Additionally, I’ll be working to drive Fusion as the next generation platform for building search-driven, intelligent applications. I’m incredibly excited to be working with such a top-notch team at Lucidworks, and am looking forward to building out what will be the most scalable, dependable, easy to use, and highly relevant search product on the market.