A colleague recently said to me, “I have the impression that different people mean different things when they talk about semantic search. What do we at Lucidworks mean when we say semantic search?”

The simplest definition of ‘semantic search’ is searching by meaning. In the context of digital commerce, my perspective is that semantic search refers to a set of techniques for finding products by meaning, as opposed to lexical search which finds products by matching words and their variants.

Others may argue that the meaning of semantic search depends on a particular technique, such as an ontology, a knowledge graph, or a semantic vector space. The inconsistency in the use of the phrase ‘semantic search’ is not surprising given the rapid evolution of the techniques to understand meaning over the past 15 years or so. Let’s first consider that history to help ground us in our current use of the phrase.

A Quick History of Semantic Search

Consulting the Wayback Machine for the year 2007, the Wikipedia entry about semantic search began with the following:

  • Semantic search attempts to augment and improve traditional research searches by leveraging XML and RDF data from semantic networks to disambiguate semantic search queries and web text in order to increase relevancy of results.

There was a clear focus on the semantic web and linked data. By 2009 the Wikipedia entry had been changed to include a reference to ontologies and the semantic web:

  • Other authors primarily regard semantic search as a set of techniques for retrieving knowledge from richly structured data sources like ontologies as found on the semantic web.

In 2010 the first sentence was changed to include the concepts of searcher intent and contextual meaning:

  • Semantic search seeks to improve search accuracy by understanding searcher intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the web or within a closed system, to generate more relevant results.

By 2019 the first sentence had been simplified:

  • Semantic search denotes search with meaning, as distinguished from lexical search where the search engine looks for literal matches of the query words or variants of them, without understanding the overall meaning of the query.

We can see the evolution of consensus on the meaning of semantic search from a focus on ontologies, RDF and the semantic web, to the more general “search with meaning.” Google’s approach to search evolved over the same period of time to focus more on meaning, introducing the Google Knowledge Graph (“things not strings“) in 2012, conversational search in 2013, RankBrain (ML based ranking) in 2015 and BERT and “neural matching” in 2019.

Rather than taking a stand on whether or not semantic search has to include the use of a knowledge graph or a particular type of ML model, I think it’s more useful to focus on the effectiveness of a set of semantic search techniques in solving specific problems.

Each Semantic Search Technique Solves Specific Problems

Even though we say that lexical search is about finding by matching words and their variants, and semantic search is about finding by matching meaning, both approaches have the same goal in the context of ecommerce: to find products that match the shopper’s intent. In other words, the goal is to respond to a query with products that are relevant to the task or interest implicit in the query.

I often hear “understanding query intent” discussed as the ultimate goal of search.. But the searcher’s query often reflects just one part of a goal. If a DIY shopper on an auto parts ecommerce site searches for shop rags, are they really just thinking about shop rags? The site can offer more relevant products once the reason for needing shop rags is understood – what job the DIY shopper has in mind. Are they doing an oil change or some other messy job? Maybe it makes sense to also include other cleanup products.

And when a shopper searches for organic lemonade on a grocery ecommerce site, why not include products relevant to an interest in organic juices and snacks?

There are ecommerce sites that do a good job of recommending products related to a goal, but the recommendations usually appear in a “you may also like” section of the page. The idea of focusing with high precision on query intent and then relegating other goal-relevant products to recommendation zones seems counterintuitive to me. Lexical search can be tuned to achieve a more goal-oriented relevance, but it usually involves query-specific rules that require constant curation to keep up with changing product assortments, shopping trends, and seasons. There are machine learning approaches to suggest such rules, but the suggestions are often of mixed quality and require vetting by the ecommerce team before being deployed – another type of curation.

Let’s focus on two specific semantic search techniques: semantic vector search for better recall based on a goal-oriented perspective of relevance, and semantic query parsing for better precision when queries include specifications such as dimensions and price range.

Semantic Vector Search

Semantic vector search is a deep learning approach in which a model learns from shopper behavior to encode products and queries in a shared vector space – sort of like the way groceries are organized in aisles and shelves in a physical store. The organic lemonade is next to the organic orange juice. The flour is next to other common baking ingredients. The grocery store staff make adjustments in how products are shelved based on shopper behavior. The semantic vector search model continues to learn over time as product assortments and shopper behavior change.

Semantic Vector Search Enables More Intuitive Relevance

Semantic vector search is much better than lexical search at predicting relevance based on what shoppers tend to buy given a specific query, and it does so without curation by merchandisers and search managers. The shopper searches for organic lemonade; they get to see organic orange juice and a variety pack of organic juices for kids following the organic lemonade. Furthermore, it is able to accomplish this for queries it hasn’t seen before.

Semantic Vector Search Slashes Zero Results

Fixing zero-results queries is an ornery burden for search managers. It’s another curation task without end, often involving a double-digit percentage of searches. Search managers end up focusing on the top occurring zero-results queries, meaning that a large set of long-tail queries are not addressed – missed sales and money left on the table.

Semantic vector search produces far fewer zero-results outcomes, without curation. When organic lemonade is out of stock, the shopper still sees organic orange juice and the variety pack of organic juices for kids. One of the world’s top five retailers deployed semantic vector search and decreased null results by 91% compared to the previous year—that translates into hundreds of millions in sales. For example if I search for “pumpernickel crackers” that don’t exist, I’m served a mix of other similar products.

Start With Specific Targets

Semantic search delivers a more intuitive implementation of relevance, so why not send all queries to semantic search?

Ecommerce companies have invested years of effort in tuning lexical search. Many queries are handled pretty well by lexical search without constant curation. We can think of this as relevance equity.

I recommend starting with queries that are not performing well based on KPIs such as AOV and CTR. The risk of damaging relevance equity is lower for these queries, and the opportunity to improve KPIs and save the time of merchandisers and search managers is higher.

Semantic Query Parsing

Another semantic search technique is semantic query parsing, which is a type of word sense disambiguation (WSD). WSD research has been going on for decades, and various machine learning techniques continue to improve and compete for state-of-the-art status. “Knowledge-based” approaches to WSD utilize an ontology or knowledge graph (or both). In ecommerce, a knowledge graph can be derived from product data and shopper behavior. There is often some level of curation involved in maintaining the knowledge graph.

The goal of semantic query parsing in ecommerce is to identify mentions of concepts in a query, and then to find products relevant to those concepts. The concepts could be named entities such as brands, designers, and manufacturers, or specifications, such as size, color, and price range.

Specifications can be further classified into negotiable and non-negotiable categories. This negotiable/non-negotiable classification is specific to each ecommerce vertical, and sometimes specific to individual companies. If I search for womens maroon pumps size 9, the gender and size are probably not negotiable. (Size might be if a specific shoe is known to fit small or large.) On the other hand, the mention of the color maroon may be negotiable.

Semantic Query Parsing Improves Precision

Semantic query parsing can be used to route queries for concept-specific processing. Different concepts can be associated with different models or logic for normalizing concept mentions.

Non-negotiable specifications such as size can be used to filter search results.

But what if the specification is ambiguous, as in the query mens jeans 30. Does 30 refer to inseam or waist? Once semantic query parsing determines that a query mentions a size, the size part can be tagged and the query can be routed to a “size resolver”.

Semantic Query Parsing Facilitates Concept-Specific Models

Negotiable specifications can be used in a variety of ways depending on the concept. For example, a query that mentions a color could be routed to a color encoder trained on shopper behavior. Which colors convert when the query is for maroon pumps? A color encoder might learn that rust and burgundy pumps are seen by shoppers as relevant to a query for maroon pumps.

Some queries include mentions of vague concepts such as inexpensive. Inexpensive kids snow boots and inexpensive adult snow boots likely imply two different price ranges. A price range model might learn to predict price ranges in the context of a query, based on shopper behavior.

Semantic Search at Lucidworks

Our strategy is about consumer goals, not consumer queries. We’re building on the foundation of our existing semantic vector search capability, adding semantic query parsing capabilities initially to act as a set of “precision guardrails” for vector search, and ultimately as a tool for routing queries to concept-specific models.

Interested in learning more about how we’re using semantic search for product discovery? Check out our recent webinar, “The Case for Semantic-Based Approaches to Product Discovery.”