As one of the world’s largest retailers, Target can’t afford slow or off-base search results. When adding products (and their data) how does Target maintain and improve speed and accuracy at the same time? They use a combination of deep learning models and custom Solr components to deliver highly accurate search results at scale – fast: imagine two million product SKUs with a return rate of 250 ms.

The Technology

Strong machine learning models are the key to scalable performance. Target built models on product title, category, type, and description — as it happens, the key search attributes that you’ll need when querying Solr.

Neural networks use models to identify different search intents and attributes.

It’s a challenge for applications to accurately and fully understand what users want. What, for example, does “c9 running shoes for boys” mean? This is a classification problem: C9 Champion is determined to be the brand, male is determined to be the gender. Retailers can develop a classification framework that, for each product attribute, can accurately generate a model to classify any query.

To classify the query, Target:

  • Gathers abundant training data from user search queries and user behavior, and from product attributes
  • Trains machine-learned models for each attribute with prepared lists of query/attribute value pairs, for example: shoes/athletic shoes, shoes/sneakers
  • Outputs a list of attribute values that are predicted to be related to the original request, with probabilities for each value

Evaluation

After sophisticated neural-model training, Target arrived at these evaluation metrics:

  • Precision: the number of correct predicted attribute values divided by the total number of predictions for a trial query from the classifier. The higher the precision, the more accurate the predictions are.
  • Recall: the number of correct predicted attribute values divided by the total number of attribute values there are for that query in the test set. The higher the recall, the more coverage of those attribute values in the test set.
  • Top-N accuracy: For a query, if any of the top N predictions is relevant, then it scores a 1, otherwise 0.

Target controls recall and precision through a combination of:

  • Category/attribute classification to relate items within a set,
  • Filtering for specificity within a set
  • Customized elevation to promote the most popular items, and
  • Precision components to filter out product SKUs based upon a threshold.

The relationship between precision and recall is inverse: Higher precision means lower recall. So retailers can determine the ideal balance between both, as well as ideal accuracy, measured in the proportion of times that at least one correct attribute value is in the top five predictions.

For Target, classifiers can achieve precision and recall above 90 percent, and accuracy of top 5 predictions above 96 percent. With proper classification pipeline, a new model can be automatically generated on any attribute within 18 hours. Retraining happens daily.

By using state-of-the-art neural network techniques, in conjunction with customized Solr components, Target has improved its search relevancy by more than 20 percent — increasing sales and decreasing user time and cost.