The Power of Signals

Users today expect fast, reliable, and personalized search experiences from the companies they choose to interact with. To surface the customer experience users crave, start by analyzing and understanding user interaction data, or signals. These signals contain powerful insight into what information is most relevant to users, how and where users expect to find it, and the language customers are using to signify their intent. Taking clues from this data, one can then start to make targeted, data-led improvements to search relevancy with the overall goal of increasing user satisfaction and improving key business metrics.

Intended Audience

Technology and business leaders interested in advancing their analytics techniques to promote more data-led feature selection for solving search problems and improving business KPIs

Audience Takeaway

A renewed understanding of why analyzing user interaction data (signals) is so impactful for fostering user satisfaction, and a refresh on how to use ML techniques in a data-led, purposeful manner.

Speaker

Jill Rosow, Data Scientist, Lucidwoks


[Jill Rosow]

Welcome. My name’s Jill Rosow. I’m a data scientist at Lucidworks, and really excited to talk to y’all today about some of the insights that are hidden within your user interaction data or your signals. 

I like to treat data like a story, so that’s how I’ll be talking about it today. You’ll see that I’ll break things down into topics, and then investigate those topics using signal analysis. Throughout this investigation, you’ll see examples of how to uncover clues that’ll help you understand how your users are interacting with your application, and find out where there’s room for improvement. Using those clues, we’ll talk about how to translate your insights into action by reaching for solutions that match up with the opportunities reported in that data. 

Before we dive in, I want you to think about where you sit on this analytics maturity curve. Everyone listening today is likely following at least some rudimentary type of analysis to understand how your app is performing. A common form of analysis is tracking business KPIs, things like your conversion rate, your case deflection rate, maybe your average basket value, all depending on your domain, and it’s super important to track these metrics, make sure that your business is hitting its targets. However, these are quantitative measures. They’re gonna be strongly influenced by outliers, they’re not gonna show you contextual specifics, and they really don’t give any direct information about who we care about most, which is our users. 

So while these are great, they mostly show us overviews of how we’re performing generally, you know, are things good, are they okay, are they bad? But to truly understand the why’s behind what’s happening, we need to advance along this analytics maturity curve, we want go from asking what happened with this hindsight perspective into why did it happen, looking for insight there, and finally making our way up to, what will happen and what should I do about it? And this is a foresight perspective that we wanna take on. 

Climbing up this curve is gonna help promote our goal of data-led decision making, and this is a goal I’m hoping all of you will share with me and take away after today’s presentation. So think about where you are today as I talk, and as I present various forms of unstructured text data from the customer service and eCommerce domain. Think about how your own user interaction data might help you uncover clues to solving problems, and get you where you wanna be on this curve in the future. 

The first topic that can help you start centering your analysis around your users is that we know, both as business professionals and as consumers ourselves, users wanna feel understood. To accomplish this, we need to stop thinking about centering your search experience around individual queries or keywords. Instead, we really wanna position ourself to think about centering our search experience around individual users, each with differing affinities and different goals. We wanna make users feel like a unique individual, not just another query coming through the system or part of some statistic that you’re tracking. 

So ask yourself, how can I learn from the users that are already interacting with my platform every single day? One of the most basic things you can do to start understanding your users is ask the simple question of what. What are my users asking? What are they searching about the most? And you may be asking this in different perspectives. For some, it might mean what category are my users asking about most? For others, it may be what kind of problems are my users reporting the most? So always remember to frame your analysis within your own domain. So here, we’re gonna be looking at this public eCommerce dataset. We know that their customers are looking for all different kinds of products, but is there one specific brand that is dominating the rest? 

Knowing this is gonna help offer up all different types of opportunities to prioritize your business actions. For example, your marketing, your training, and your education content could all be optimized to align with the users’ interests. Here’s an example showing the search distribution for the top 25 brands in this dataset. Searches in this case are queries, but they could come from other sources that could be support cases, emails, phone calls. You can analyze any and all sources of data. 

Most of the users here are searching for products without a specified brand. And this may indicate that you could have a data quality problem, and since this is the largest category of searches, I would likely dig into this segment individually and try to further define the data quality problem, measure the impact to our search relevancy. Then behind our unbranded category, we’ve got Hampton Bay followed by GE and Everbilt. Since these are our top categories, we should ensure these areas are running optimally, as they’re the most frequently encountered, and they’re gonna account for a lot of our search opportunities. 

Then you might find yourself asking more specifically, what type of language are my users using? And here, I’m talking about the diction, right? What are the literal words that my consumers use? As a first step, you might drill down into the top 20 words in your search queries, but predictably, you probably don’t learn much from this, probably gonna have a lot of really common English words that don’t carry a lot of meaning or stop words, and if you really wanna drill down into the text, you’ll wanna remove these stop words, so what you see here on this screen is the top 20 trigrams after that stop word removal. And here, we’re looking at trigrams or sequences of three words, but if your consumers typically ask longer questions, you could also look at longer strings of text. 

So when we look at these top 20 trigrams, we can start to see some themes emerging. So we see a lot of mentions of water heaters in various places, but we can also see that when a user searches for a water heater, they’re typically going to specify even further the exact type of water heater they’re looking for. We see hot water heater, gas water heater, tankless, electric, so we can start to see that our users typically get very specific with modifiers when they’re looking for a predetermined product, and that begs the question, does the language of our consumers change based on what product category they’re searching? 

So if you do have predefined categories or some sort of hierarchy, it’s a great idea to break down into it to do some further analysis. So here we see two insurance categories, annuities and home insurance. Just by looking at these top bigrams, we can see how the language really changes depending on the category of insurance you’re discussing. You can see that when consumers are curious about annuities, they typically ask questions around investments, deferrals, equity, and concepts like fixed versus variable. And this is very different from what we see in the home insurance category. These users tend to wonder about things like damage, water coverage, costs, and concepts like owners versus renters. Knowing these things and finding these insights can help us understand how our users’ intent changes depending on where they land. This can be very useful for segmenting your content, for providing recommendations, or even training some custom machine learning algorithms to help us recognize what category that user’s likely speaking about. 

Okay, so we talked about consumers wanting to be known. We wanna understand the language they use. Importantly, users also want seamless answers. So regardless of which channel or platform I choose to interact with, I expect that search experience to be congruent and contiguous. If your users typically try to self-service first, maybe they query a knowledge base, or maybe your users are coming from social media, we should know where they came from so that we have all the information possible to be able to address their ask quickly. 

So first thing to understand this problem is looking at where your users are coming from. Right, are they asking for information on mobile, in-app, desktop, right, if our customers are mostly asking for personalized help on desktop for example, we might wanna ensure chat bots are optimized for desktop. We also importantly want to ensure that the experience is the same across all of these entry points. Right, a poor in-app experience that doesn’t align to a great desktop experience could cause us to lose a mobile native customer, and maybe cause us to lose business in a growing area. Similarly, if a lower number of users on mobile is seen, it might inform us, maybe the experience there is subpar. Right, so it’s important to look at and analyze these interactions, and question what this data means. And remember that we wanna test our assumptions here. 

You might assume that most of our consumers are looking for products using search, or maybe they are assuming that we go to support via phone calls, however, our data may actually show they prefer to get guided help through a browse experience or through social media. So investing in those solutions can contribute positively to KPIs if that were the case. As an example, social media is a channel that’s growing at an exponential rate, and it may not be a place where most of your customers try to resolve queries right now, but that will likely change in short order. So using this user interaction data to help you identify when it’s time to make that right decision is a great way to guide your investments. 

Finally, seamless conversation’s gonna require you to recognize what your consumers do both before and after submitting a ticket or getting a zero search result. Did they browse first? Did they try to revise their query? Does the data tell you that you aren’t doing a good job of surfacing relevant information for a consumer looking for a product that you really do carry, or knowledge that you really do have indexed? Could these zero result queries or support cases potentially have been deflected, had our search engine really understood what the user was asking? Remember that by connecting the interactions our users have with various touchpoints, we can get an overview of their journey. And here we wanna look for points of interception before things go wrong. 

As an example, if you saw some browse action with no clicks, they likely didn’t find exactly what they were looking for. If we saw a lot of clicks but they still moved on to pursue other options, they likely found relevant information, but may be looking for something more specific. Similarly, if someone were to requery or revise their query after initial unsuccessful query, then we get more information on exactly what that user’s looking for. Did that requery contain a corrected spelling, a synonym, maybe an additional term? If that requery led to a successful interaction, how can we think about tying the relevant search results to that initial zero result query so we can prevent a similar issue in the future? 

And then the third topic, and I love this one, is that more than anything else, consumers wanna feel like empowered problem solvers. And this is great because it means empowering your users to solve their own problems and make purchases without involving support is gonna make them happier. And as we know, happy users are loyal users. So they want all the knowledge available at their fingertips, as well as the ability to quickly sort through this information and find relevant results. As we know, striking that balance between recall and precision is not that easy. So here are some questions that you can ask to help determine if your consumers are set up to be empowered problem solvers. 

First is, do our users have access to the information they need? Often times, the most relevant data’s gonna live in various silos spread across multiple data sources. Can your users access all of this differing information from the same interface? Right, we want them to be able to access everything all from the same place. When performing a complex search, the best answer often will come from integrating various information across these silos. To make sure that you have the ability to connect to different data sources, things like Salesforce, websites, and maybe even just a simple file upload, make sure that we have all of that information readily available to our user within the same spot. And that information should also be prioritized in a personalized manner to the current user, that way the user feels known, and it’s gonna make their search experience smoother and faster. All of that signal data that we collected from search queries, browse activity, and help centers are gonna play a key part in personalizing this vast amount of data. And if they do have access to all of that data, are they able to actually find the relevant information? 

Upon analyzing your user queries, do you find yourself saying, I know we have the relevant results for this, y’know, we carry this product, we have information on this problem, but they weren’t returned to a user, they’re not being surfaced. If you see that pattern, having a sophisticated search engine that can power things like similar query suggestions, recommended products or articles, and even semantically decipher the meaning of your queries instead of looking for keyword matches, these things are gonna be incredibly important for empowering your consumers to find those results efficiently. And of course that’ll have implications for many of those important KPIs like your click-through rate, your conversion rate, your bounce rate, and even increased call center costs. So it will be worth the investment to empower your users. 

Now, only once we understand our users as individuals, we know what their pain points are, what their affinities are, and the story that it’s all telling us in relation to our KPIs, that’s when we can move forward with solutioning or acting on these insights. So I’ll start by showing some effective but more statistical-based solutions. These are gonna address very specific problems that you’ll see that you may see in your data. However, as our goal is to optimize our search experience using that foresight mindset, it’s important to start thinking about how we might transition from addressing the very specific individual pain points after they’re reported versus utilizing more mature and advanced deep learning techniques to address both our old and our new problems as they surface, when they surface, right? Get rid of the lag time and stop fixing things after they’ve happened. 

So let’s imagine you’re analyzing the content of your tickets or your searches, and you notice some common misspellings that are specific to your domain. In this example, we see water misspelled by a single letter, you know, a common mistake if you’re typing too fast, and one that you would expect the search engine to reconcile. So being able to recognize this occurrence shows us we’re consistently missing opportunities to return information on water heaters, even though we have it. One solutions could be to consider a spell-check feature that helps you align to your users’ vocabulary. Or maybe if after reviewing your search data, you find some queries or support issues that perform really well in terms of converting or resulting, while other, more specific questions have a harder time finding the right content. If you see this type of distribution, doing some head-tail analysis may be a good option. 

Using that user interaction data, again, our signals in understanding how both those long and short-tail queries perform, we can use statistics and NLP techniques to try to map our low-performing queries to turn into our high-performing queries, and enable our users to find the right answer, even when they enter a typically low-performing search. Or perhaps after evaluating both the content of your user queries and the responses from your agents or search results, you may realize your consumers are using slightly different language than your knowledge base does, so consider these examples here. 

On the left, we see questions in critical illness insurance category, and then the questions, we see users using the term full coverage. On this right, in our knowledge base, we have comprehensive coverage taking up a little bit more stake than full coverage. Think about which of these terms are gonna hit more relevant results. Should we include them both if a user enters one or the other? Recognizing these patterns in your user data can help you convert queries to ones that will find the right answers. If you see things like full versus comprehensive, you know, these slight synonyms, or even things like users using abbreviations when your index spells things out, you may wanna consider a synonym-based approach to promote better recall. And then think about some of your repetitive queries or cases. You know, if agents are having to solve the same cases or related problems over and over again, we can speed that up. Show them the similar cases when a new one opens up. If that same or similar case has been filed before, show them how it was solved the last time. 

And similarly, if you had a subset of queries that were highly popular amongst your users, it’s likely that those queries are associated with relevant results, and a large percentage of your user base may be interested in these. In that case, you may wanna consider presenting some popular or trending query suggestions to users to help enrich their shopping experience from the get-go. Finally, let’s talk briefly about how you might address all of the previous use cases, right, query correction, expansion, augmentation, and recommendations, using just a single solution. And this is semantic vector search. Semantic vector search can be summed up in this way. Instead of showing me what I said, show me what I want. 

There’s no need to adapt your query over and over again, trying to hit those keywords that might bring you to your desired result. Instead, search the way you naturally speak, and let semantic vector search figure it out. Well this means that we’re no longer gonna need that exact match in the index on keywords. Instead, we’re gonna utilize deep learning algorithms to capture semantic meaning from both our user queries and our index text. And user interaction data, those signals can be great training data for the algorithms after some proper cleansing. You can train your algorithm to align with your users’ language and your domain content, as well as address new low-performing searches in real time, again, moving us further up that maturity curve and the process. 

Remember to consider these solutions in the context of your analysis. What feature will truly serve your users and address the problems that you’ve reported and predicted for the future? If you’re interested in learning more about Lucidworks’ semantic vector search offering, please check out my coworker Sava’s talk. It’s gonna be directly after this one. 

So now that you’ve asked questions, you’ve hypothesized, uncovered new information about user pain points, and begun testing and acting on those clues, you’ve implemented data-aligned features, and you may think, well, what do I do with my signal data now? Is this the end of the story? And the answer’s no. Not done there, right? This is a feedback loop where users are gonna show you the why’s of the current state. You’re gonna take that information, iterate, and improve, and then your continued analysis is gonna let you know if it’s working. Your users are ultimately the ones interacting with your new solutions, so let them show you why or why not it’s working. Now that you’ve built these practices, youcan breathe a little easier knowing that your users are gonna continuously let you know if what you’re doing is working. So stay curious, stay observant, and stay in that loop with an open mind. Just build better from here. 

Thank you guys, and again, don’t forget to check out the next talk by Sava on the power of semantic vector search.

Play Video