# Clustering and Classification in Ecommerce

With 2018 in the books, ecommerce’s share of retail sales was pushing 13%, according to Mastercard SpendingPulse. This massive growth of sales has correlated to a massive burst in customer behavior data. Advancements in AI are giving retailers unprecedented, detailed insights into customer behavior, thus allowing them to improve customer experience in a variety of ways: product recommendations, personalized search, customer support, and dynamic pricing.

But when we say AI, we are really talking about *machine learning*, a sub-field of AI that teaches machines to learn and derives insights from input data. This article will introduce two well-known machine learning techniques — *classification* and *clustering* — that have had an influential impact in the ecommerce domain. We’ll also introduce you to some statistical models that your data scientists may use to help train the machine.

Being aware of these various models will help you to understand the types of technology capabilities you’ll need to have if precision is crucial to satisfying your customer. Being an expert in statistical models is less important than having a technology that can support you. Think of it this way: If you are a bakery and want to offer gluten-free bread, you need to know every ingredient going into your product.

## Supervised vs Unsupervised Learning

Before we get into statistical modeling, let’s go through a few terms. Customers want the most relevant results (quality), which is called *precision*. They also want choice (quantity), which is called *recall*. So the dance is between giving them lots of options and then honing in on that which is most relevant to them. If you are a merchandiser, you need to ensure your data scientists have control over the models in order to get the most *precision*.

Some of this precision and recall can be done just using a strong search engine. But on top of that, we can look and see what an individual user — and other individual users in aggregate — have done historically. That lets you assess the probability that customers will buy one thing over another — and that gives you the ability to recommend.

In order to teach the machine on which recommendations work and which don’t, we have two methods to employ: supervised and unsupervised learning. In supervised learning, we first specify a *target variable* and then ask the machine to learn from our data.

For example, let’s say we have a bunch of photos or products that contain different fashion items such as shoes, shirts, dresses, jeans, jackets, etc. We can train a supervised learning model on these photos to learn the items on each photo, and then use that model to recognize those same items on new photos. Those items are essentially the target variables that we want the model to learn. In order to have supervised learning, you must have clearly labeled data.

But what if you don’t? Or what if only some of your data is clearly labeled? We eliminate the idea of having a target variable and call this unsupervised learning. More on both shortly.

## Importance of Classification in Ecommerce

Purse or handbag? Sneakers or athletic shoes? Outerwear or coats? People call things different things and in retail the worst thing you can do is have a search engine deliver nothing back to a customer because they typed in an alternative word or synonym. These types of outputs are called discrete *output variables* and we use a method called “classification” to train the computer from a series of inputs.

Teaching the machine to find all items under a specific class requires your training data to be clearly labeled. Once cleaned, you can apply a few different machine learning algorithms to train the data. Here are a few:

1 – **k-Nearest Neighbors** (KNN) algorithm is very simple and very effective. Predictions are made for a new data point by searching through the entire training set for the K most similar instances (the neighbors) and summarizing the output variable for those K instances. The biggest use case of k-Nearest Neighbors is recommender systems, in which if we know a user likes a particular item, we can recommend similar items for them.

In retail you use this method to identify key patterns in customer purchasing behavior, and subsequently increase sales and customer satisfaction by anticipating customer behavior.

2 – **Decision Trees** is another important type of classification technique used for predictive modeling machine learning. The representation of the decision tree model is a binary tree. Each node represents a single input variable (x) and a split point on that variable (assuming the variable is numeric). The leaf nodes of the tree contain an output variable (y) which is used to make a prediction.

Predictions are made by “walking the splits of the tree” until arriving at a leaf node and output the class value at that leaf node. Decision trees have a wide range of real-world applications from selecting which merchandise to shop for, to choosing what outfits to wear at an office party.

3 – **Logistic Regression** is the go-to method when our target variable is categorical with two or more levels. Some examples are the gender of a user, the outcome of a sports game, or the political affiliation a person has.

4 – **Naive Bayes** model is comprised of two types of probabilities that can be calculated directly from your training data: 1) the probability of each class, and 2) the conditional probability for each class given each x value. Once calculated, the probability model can be used to make predictions for new data using Bayes’ Theorem.

Naive Bayes can be applied in various scenarios: marking an email as spam or not spam, forecasting the weather to be sunny or rainy, checking a customer review expressing positive or negative sentiment, and more.

## Training a Data Set With Statistical Models

Say your data scientists have decided on a machine learning algorithm to use for classification. What we need to do next is to train the algorithm, or allow it to learn. To train the algorithm, we feed it quality data known as a *training set*, the set of training examples used to train our algorithms. The *target variable* is what we’ll be trying to predict with our machine learning algorithms.

In a training set, the target variable is known. The machine learns by finding some relationship between the features and the target variable. In the classification problem, the target variables are also called *classes*, and there is assumed to be a finite number of classes.

To test machine learning algorithms, we need a separate dataset from the training set known as the *test set*. Initially, the program is fed the training examples; this is when the learning happens. Next, the program is fed the test set.

The class for each example from the test set is not given to the program, and the program decides which classification each example should belong to. The class that the training example belongs to is then compared to the predicted value, and we can get a sense of for how accurate the algorithm is.

## Importance of Clustering in Ecommerce

The clustering task is an instance of unsupervised learning that automatically forms clusters of similar things. The key difference from classification is that in classification, we know what we are looking for. That is not the case in clustering. Clustering is sometimes called unsupervised classification because it produces the same result as classification but without having predefined classes.

We can cluster almost anything, and the more similar the items are in the cluster, the better our clusters are. This notion of similarity depends on a similarity measurement. Because we don’t have a target variable as we did in classification, we call this *unsupervised learning*. Instead of telling the machine “Predict Y for our data X,” we are asking “What can you tell me about X?”.

For instance, things that we can ask an unsupervised learning algorithm to tell us about a customer purchase dataset may include, “Based on their ZIP code, what are the 20 best geographic groups we can make out of this group of customers?” or “What 10 product items occur together most frequently in this group of customers?”

One widely used clustering algorithm is *k-means* where k is a user-specified number of clusters to create. The k-means clustering algorithm starts with k-random cluster centers known as centroids.

Next, the algorithm computes the distance from every point to the cluster centers. Each point is assigned to the closest cluster center. The cluster centers are then re-calculated based on the new points in the cluster. This process is repeated until the cluster centers no longer move. This simple algorithm is quite effective but is sensitive to the initial cluster placement.

To provide better clustering, a second algorithm called *bisecting k-means* can be used. Bisecting k-means starts with all the points in one cluster and then splits the clusters using k-means with a k of 2. In the next iteration, the cluster with the largest error is chosen to be split. This process is repeated until k clusters have been created. In general, bisecting k-means creates better clusters than the original k-means does.

## Ecommerce Use Case

In one of our previous posts, we suggested how Amazon could fix its recommendations by incorporating clustering to segment customers in order to determine if they are likely to buy something similar again. Let’s see how that can happen with a hypothetical scenario:

**Clustering:**Let’s say Amazon has a dataset of all the purchase orders for 500,000 customers in the past week. The dataset has many features that can be broadly categorized into customer profile (gender, age, ZIP code, occupation) and item profile (types, brands, description, color). By applying the k-means clustering algorithm to this dataset, we end up with 10 different clusters. At this point, we do not know what each of these clusters represents; so we arbitrarily call them Cluster 1, 2, 3, and so on.**Classification:**Okay, it’s time to do supervised learning. We now look at cluster 1 and use a Naive Bayes algorithm to predict the probabilities of ZIP code and item type features for all the data points.It turns out that 95% of the data in cluster 1 consist of customers who live in New York and frequently buy high-heel shoes. Awesome, let’s now look at cluster 2 and use logistic regression to do binary classification on the gender and color features for all the data points. As a result, the data in cluster 2 consist of male customers who are obsessed with any items that are black. If we keep doing this for all the remaining clusters, we will end up with very detailed description for each of them.**Recommendation:**Finally, we can recommend items to the customer, knowing that they are highly relevant according to our prior segmentation analysis. We can simply use the k-Nearest Neighbor algorithm to find the items to recommend. For example, customers in cluster 1 are recommended a pair of Marc New York high heels, customers in cluster 2 are recommended a black razor from Dollar Shave Club, and so on.

Supervised and unsupervised learning are two of the main machine learning approaches that power most of the AI applications currently deployed in ecommerce technology. The underlying algorithms each are *classification* for supervised learning and *clustering* for unsupervised learning.

You can see there is a fair amount of tweaking or tuning that could be done to make sure you have that optimal balance of recall and precision. And despite all the math — tuning is more akin to art. Having access to the algorithms so they can be continuously refined is key.

In my next article, we will look at how learning to rank, a key information retrieval tool that uses machine learning and is key to many web services to improve their search results.

## Learn More

- Read additional Retail Recommendations Across the Omnichannel
- Watch Clustering vs. Classification in AI – How Are They Different?
- Contact us for ecommerce search help

*James Le is a Rochester-based writer, who is working on his Masters Degree in Computer Science, specializing in Artificial Intelligence, at RIT. He has had professional experience in data science, product management, and technical writing.*