A customer walks into a store, looking for a French press coffee maker. A store associate guides them to the coffee section where there are automatic coffee makers, pour-overs, AeroPresses, and French presses. They’re presented with options, but also with exactly what they’re looking for. This is easy in a store – the associate can point at the French press, but the customer immediately sees all of the other related products.

But how do you recreate that experience online? Or should you?

The online equivalent of asking an associate for help is clicking into the search bar and typing “french press.” This is where our expectations suddenly change versus the in store experience. People who work on search spend countless hours and dollars trying to make all the French presses and only the French presses come back as search results. As search people, if we are able to do this, we say we’ve got perfect precision on the query. And, from a user experience perspective, there are a lot of advantages to perfect precision – the user is able to use facets and sorts to hone in on the exact French press she wants.

Let’s set aside a much bigger and more important question: Whether we’ve really met the user’s goal. The customer who walked into the physical store may have walked out with an AeroPress and the physical store would have looked at this interaction as successful. But if the same thing happened in the online store, the search team might or might not be as pleased. We will explore the concept of lexical versus semantic query understanding and how those ideas relate to consumer intent in a future article.

The reality of what happens in many online stores is that the search platform returns some French presses along with all kinds of other unrelated stuff. We might see French cuff shirts, we might see a Mr. Coffee, we might see K-Cup pods, we might see a dress by Hot Off The Press! made of French fabric, we might see a Le Creuset French pâté terrine with a press for ensuring the pâté has no air bubbles.

The French-made Le Creuset Pate Terrine has a press in it to improve the texture of pate. But it’s not a French Press!


These kinds of results are aggravating for customers and merchants and are commonly caused by something called “diffusion.”

For example, text in a category name is considered diffuse when two or more distinct ideas are grouped together, usually with an ampersand or the word and. When a diffuse category is present it is often handled improperly by search. That is, searching for a product on one side of the ampersand will return products on the other side when it shouldn’t. For example:

  • A search for oil returns vinegar because the category is “Oil & Vinegar”
  • A search for potassium returns magnesium because the category is “Potassium – Magnesium” 
  • A search for a cami will return a slip because the category is “Lingerie, Slips and Camis”
GNC has a broad assortment of health products.  

Their assortment is broad in potassium and magnesium supplements, but not so broad that they would want to break these into two distinct categories.

This is part one of a three part series of posts on what you should consider when looking at your product catalogue to obtain lexical precision and ensure you’re not subjecting customers to diffusion confusion. 

Diffuse Categories

A search index is made up of records. A record is like a row in a database and usually there is one record for every SKU in the catalog. Each record is made up of product information such as the product name, its description, brand, price, and so forth. The category name is useful information to put into the searchable fields on a product record because the category name usually describes what, at the most fundamental level, all of the things in that category actually are. It sounds obvious, but let’s explore why it is important to put the category name on a product record.

If your business is known for its great sweaters, and you have a lot of breadth in sweaters, you might not put the word “sweater” on every product name. You might abbreviate a bit and name a product based on the features that make it different from other products in the category. Instead of calling it the Abigail Scoopneck Sweater you might call it the Abigail Scoopneck.

But, if the word “sweaters” is not searchable on the product record for the Abilgail Scoopneck, how will the search engine retrieve it when someone searches for “sweater”? This is where the category (and subcategory and super category and so forth) come in handy.

Here Ann Taylor has a sweaters category, but they haven’t put the word ‘sweater’ on the title of the Boatneck perfect pullover. Will this come back when someone searches for sweater?

It is at this point that companies face an important decision when setting up a search platform. Do they index category names and bring back sweatshirts when someone searches for a sweater, or do they leave category names out and fail to return the Abigail Scoopneck? Either way, it’s a devil’s bargain, but in our opinion, it is always better to over-recall products than under-recall products, so our base recommendation is to include the category names. It’s not ideal, but it’s better to force a user to sort or filter or otherwise hunt for a product in a result set than to not return a product at all and risk them leaving the site because they assume you don’t have it.

To mitigate the problem of diffuse category names, a good rule for search is prioritizing matches between the query and the product-specific content before matching with the product’s category. There will be exceptions, but this is a good rule generally speaking.

If you’re interested in learning more about how Lucidworks solves diffusion confusion for retailers like REI, Goop, and more, drop us a line.

About Peter Curran

Peter Curran

Meet Peter Curran, General Manager of Digital Commerce

Read more from this author


Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.