The Path from Consumer Analytics to Data Science

Presented at Activate Product Discovery 2021. This talk describes the journey commerce teams should take from conducting relatively simple consumer analytics, to extracting themes and business intelligence from those analytics, to constructing use cases and objectives to attack from a data science perspective.

Speaker:
Om Kanwar, Lucidworks Data Scientist


Transcript

Om Kanwar: Hi there. Welcome to Activate Product Discovery. Thank you for joining this session. My name is Om Kanwar, and I’m a data scientist on the digital commerce team here at Lucidworks. Today, I’m going to be talking about the relationship between analytics and data science. I started at Lucidworks, working as a data scientist on the PS team, focused on making our clients’ data science desires a reality.

One thing I noticed through numerous client engagements that I’ve participated in, it seemed that there was a lack of emphasis or understanding on the relationship between data science and analytics. Today, I’m going to talk about why analytics is essential to the adoption of data science, and some strategies for success, obviously with the focus on product discovery initiatives.

Every year at Lucidworks, we go through basecamp where we gather as a company to discuss and align on our company and product vision for the year ahead. One of the activities we do as part of this base camp is to craft individual mission statements with the help of our peers. My mission statement stemming from the activity is very topical for our discussion today, so I thought I would present it. And it’s gonna double as our agenda.

To deliver our client’s unique and differentiating data science features, which propels both their business forward and the expansion of the data science footprint within their organization.

I’d like to key in on two words here – differentiating and expansion – and discussing what it means to create differentiating data science features, and how you can expand data science across your organization.

I’ll posit that the most data science features are not the ones that use the fanciest technology, like a hot new neural network framework or something akin. But instead, the most valuable data science features are the ones that you can readily adopt into your strategy and actually execute on them. Your digital conference experience is not R and D, and it’s live right now, and it runs around the clock 365 days a year.

Therefore, you should be here not for data science theories but for actual data science execution strategies. Having teams grounded with this in mind is how you can create limitless applications for data science methodologies, both in product discovery and across other avenues in your organization

Let’s talk about what could be limiting organization or department today from adopting Data science methodology successfully. One thing that I have frequently seen while working with our clients is that oftentimes data science teams are siloed into a singular development or engineering focus. When you do this, you’re inherently limiting the possible impact of that data science team can have because they just lose exposure to other areas of the organization where they could have an impact

Directly relating to that, let’s talk about collaboration between analytics and data science teams. Again, you don’t want to hide potential opportunities for your data science teams to be involved. One way you can limit that possibility is by not having proper avenues of collaboration. I’ll share an anecdote there.

I was recently on a client call with data scientists from the client side and also people from the merchandising team. And some of the merchandising team had asked, you know, we’re struggling with finding the appropriate weighting for different attributes on our search result page. You know, we don’t know if the color attributes should be weighted more or the activity attributes, or something like that, and before I could even answer a data scientist from the client side exclaimed, “Why do you need to manually decide the weighting of such attributes? I have a model that can learn what the best optimal layout is. Why don’t we go with that?”

And I share that as an example of describing how impacts could be hidden within your organization without you even knowing, you know, a more collaborative structure between a merchandising and a data science team. And that specific scenario, you could have arrived at that solution much sooner, and without any friction involved in that process.

Lastly, let’s talk about how understanding what artificial intelligence and machine learning can and cannot do is also paramount to a successful data science adoption. Understanding that it’s not magic, and there’s no easy button for turning all of this on all at once. Adopting and executing these kinds of methodologies is a process that requires a cross functional collaboration, and an understanding of the actual capabilities.

So I’m going to describe a happy path to data science adoption, and what that looks like. As I’m describing this, think about how this applies to you and your role. This is not something just for leaders to buy into. There’s a part to play for every spot on your org chart.

So we have kind of going to separate this into four phases here: the beginning, the adoption journey, expanding that footprint, and then getting to a peak of an operational AI. Let’s start with beginning the adoption. The adoption actually begins without any data science at all. It’s about turning raw data into informative reports that you can use to drive decision making. You can think about this turning raw data into informative reports. Think about that as really spinning straw into gold. As we look into the expansion of the AI footprint, this involves bringing a data science mindset to more and more different kinds of conversations. So this is where you open up avenues or increased collaboration between your data science personnel, and other areas of your organization. You’ll know that you’re expanding because you’ll see that more and more parts of your product discovery journey are being influenced by data science methodologies or capabilities.

When we get to that operational AI peak, this is a goal that all organizations should shoot for. you should have cross functional fluidity and alignment on these analytics driven initiatives with data science in mind. And this is where it should be very apparent that your data science personnel are active across many different kinds of projects beyond just an engineering focus. They’re working collaboratively across the department.

And lastly, you should never stop. You know, that peak isn’t meant to be a finish line. There is no finish line in adopting data science. You should continue to iterate and explore and innovate on your data science desires and put those desires into motion. You should always have a laundry list of data science initiatives that you want to tackle next. So that’s how you can kind of ensure that you won’t stagnate or stop innovating is if you always have a laundry list of things you want to do next. You can always have another initiative to start once one finishes.

All right, let’s get into the weeds here and talk about what each of these steps looks like. The bottom line here is always just to improve product discovery. That’s the goal that we’re all after.

The first step to do that is to take raw data that looks like this. It’s meaningless – no one can make sense of such information and turn this from straw into gold. This doesn’t require any tensorflow or anything fancy to do – just spinning that raw data into an informative report like this, you can uncover some glaringly obvious problems, and solve for them. Sometimes they often have very simple solutions, and you just need to be exposed to the problem.

So you can see here, something very, very simple. No tensorflow required here to find out that we’ve got a lot of low performing queries with typeahead in mind. Type ahead might be something to look at in this scenario. Let’s now follow this product discovery journey at a more granular level.

Let’s consider a consumer who is coming to your website and submitting a search and coming to the search result page. They’re not converting to the product detail page of the PDP, so they get stuck on the search result page, and then they abandon to some other activity. You wanna lay that information out. What kinds of queries are low performing? I kind of dig a little deeper to why they are low performing. We can see we’ve got some part numbers at the top here that don’t seem to be converting. That might be something to look at your configuration to see how you’re addressing part number searches.

What about redirects? You’ve got common searches like gift card and return here which are not converting at all. You want to make sure you have the proper redirects in place. Those are instances where a search result page isn’t really appropriate, and they should be redirected perhaps somewhere else on your website.

Think about other kinds of underlying themes of problems like a specific category of queries that isn’t performing well. Or maybe there’s lots of foreign vocabulary to your catalog. your consumer vernacular that they’re searching doesn’t match with what’s in your catalog, and therefore you’re finding it hard to find appropriate and relevant results to return.

And you want to make sure that you segment these problems so that you’re handling these different themes accordingly. It’s not a one size fits all solution In most cases.

Let’s continue to follow that consumer journey. We’re getting consumers to the PDP, but they’re not adding to cart after they get there. Let’s take a specific example of an alpaca, right? So as a merchandiser, you might say, actually, for a query of Alpaca these are the queries that are the items I would actually want to be returning. I think the consumers are getting to the right PDP pitch, but from there, why are they not adding to cart? It seems like they’ve found what I think they’re looking for. Are these items out of stock or their specific sizes or colors that are out of stock, which is preventing them from adding them to cart?

The point here is that you might look at this as a search problem, but when you dig into the details, you’ll find out that not all such problems are actually search problems. In fact, you know this is where you need to open up other avenues of collaborations. Maybe it’s your inventory management team to tell them, “Hey, we need we need more stock of the Blue Interceptor because that’s a common product that people are getting to, but they’re not adding to CART because of a color size availability.” So it’s about arming yourself with compelling data evidence to why a solution is needed and then taking that evidence to those different teams.

Let’s now talk about understanding query intent and specifically how to ascertain purchase intent just from looking at queries and statistics associated with them. Here we’re looking at the COLORED column. You’re looking at the total revenue that you get from a query, and that’s divided by the total number of instances where that query was seen. So that’s what that number you’re getting that revenue per total queries. Notice the themes of the low performing queries here – we’re getting a lot of generic, vague product searches – hats, women’s shorts and here at the bottom, just seeing pants and shorts. There’s no specific product that we’re identifying there or even in the middle. We’re looking at general branded terms, not a specific product by any means. You can think about this as, uh, the window shopper, right? I grew up in Chicago, so I’m thinking about walking down Michigan Avenue and passing the Eddie Bauer Bauer shop and just thinking, “I wonder what kind of backpacks they have in there?” and walking in. it’s pretty unlikely that I’m going to buy something if all I’m wondering is what kind of backpacks they have in there – that’s a very vague intent. There’s no particular product I’m looking for.

Now let’s compare this with the high performing queries and the themes that you see on these high performing queries. The thing that sticks out to me is the specificity in all these queries, you think about item numbers and item numbers that you see here. Those are for very specific products. It expresses a high degree of intent. Same thing with queries like renegade short and burr jacket. It’s very clear what products consumers are looking for there.

Coming back to the window stopper analogy, when I pass Nike Town in Chicago and I wonder if they have the Jordan three white cement color waiting. That’s a very specific intent. When I go in that store looking for that shoe, if they have it, I’m going to buy it. So it’s about then leveraging that information that you know, so when people are searching for something very specific, how can I cash in on that expressing and indicating that intent.

So we consider such information when thinking about uplift modeling or propensity modelling for predicting the purchase behavior of consumers. You should use this kind of information with your data science teams to influence the building of such models. So you’re using your consumers indicators much more explicitly in your model building processes.

Let’s dig a little deeper into attribute specific searches. So these are searches, that mention something – a particular kind of attribute that may or may not be present in your product catalog – things, like colors, or activities, or occasions.

Let’s look at queries that mention camo. The camo queries are not performing very well. You’re going to wonder why That is. Is camouflage a foreign vernacular to my catalog. Is that why I’m I’m having a tough time returning relevant results? Let’s think about the word the color of khaki. We’re kind of struggling to return appropriate results for khaki on a variety of different kinds of searches – for hats, for pants, for even slightly vaguer searches like mountain khaki. We’re not exactly sure what they’re looking for. Think about why you’re struggling with such kinds of searches. Then also look at what kind of searches are you actually performed pretty well. If you look at this example for queries that mentioned hiking, these are actually performing quite well. If you see as we compare the 0% conversion rate, between the graphics we showed before and this one, we’re very quickly getting to some high converting numbers when queries mentioned hiking versus when they mentioned camo or khaki.

So you should think about this in the context of kind of query classification, data science initiatives. Where, okay, can we create a solution that will actually identify when something from my catalog is mentioned, right? And can I then, you know, boost that category of products from my catalog appropriately. So when someone searches winter hiking pants, if they’re not just getting back at pants? But that hiking attributes is weighted much more appropriately, thinking about classified queries in that context. Also thinking about a like an occasion or activity or whatever you want to call it that kind of project predictor where you’re trying to ascertain and predict what kind of occasion this consumer is shopping for. So are they planning for a hiking trip? in the camo example, are they getting ready for a hunting trip? If you can guess and predict that end goal or occasion that the consumer is shopping for, you can tailor their whole experience on your website on a much more personal level and definitely increase engagement with the kinds of products you’re surfacing to them.

Let’s kind of wrap up these kind of technical applications and informative reports by talking about some common themes and challenges have noticed when talking to our clients.

We’ve already touched on this but low performing queries, right? There could be many reasons for why they’re low performing, so zero results. For example, I think everyone agrees that the most poorly performing queries are the ones that returned zero results, and also thinking about how you’re actually solving for that, right? So a lot of times we have customers tell us that we’re handling this from on a rule level on a very manual level, and that’s a big operational burden for our merchandisers to be hunting down all these low performing queries.

Our solution there is a semantic vector search, and I invite you guys to listen to Eric Redman’s talk, my colleague on semantic vector research for more details on this. Similarly, you can think about low performing products. So the same kind of analysis we did, um, in terms on a query level. You can do that on a product level to find what products are low performing and why. You can think about how are they being placed on the website? What kind of recommender zones or types of recommenders are they being shown in or not being shown. To hear more about this topic, I can recommend Garrett Schwegler’s Activate session on AI powered recommenders.

And then thinking about precision search. This goes back to that kind of attributes search that I was just mentioning when a consumer is expressing a particular intent. How can you actually leverage that intent and tailor their experiences appropriately? So things like size and dimensions searches or attribute waiting, as I mentioned earlier, identifying the concepts behind what consumers are searching for and we have a solution around semantic query parsing around that. if you’re interested, we can definitely talk more about that.

Switching gears. Let’s zero in on kind of the other half of this, and that’s how to expand the data signs footprint effectively in your organization. So the way I think about this is thinking about organizing a data science Strike team and think about the different personas that are involved in that strike team.

Starting with the product Discovery Manager, you can think of these as your product manager for search, but generally these are the people who develop your product discovery strategy on a business and tactical level. They’re also responsible for developing the framework of how you’re going to measure the success of your product discovery experience.

These people work closely with a product discovery tech lead – these are commonly called like a lead search architect or lead search engineer. This persona is fully versed in your technology stack, and they lead the technical implementation of the product discovery technology. Again, they are responsible for the technical implementation of zones, such as site search browse,
recommendations, navigations, facets, things to that effect.

Lastly, let’s think about complementary roles of this strike team. The data scientists, merchandisers and data engineers. You might think it’s odd that a data scientist is considered a complementary role as part of this strike team persona. The reason why I have it here in this complementary role is because it’s not absolutely necessary for every project to have a dedicated resource. It’s definitely advantageous to you and your organization overall to have personnel like this at your at your disposal for strike teams such as this. You also have the opportunity to partner with organizations like us, and we can bring that data science flavor to such strike teams to help you with your initiatives.

All in all, you know whether it’s an organization like us or some internal personnel that you have around data science, their main service in the strike team is to translate and articulate a business problem and translate that into a technical solution.

I also want to spotlight merchandisers and data engineers here. These are the people who are in the weeds, and these are people who are definitely essential to the execution and success of your projects.

These are the people who are in the weeds identifying pain points and challenges. And oftentimes, they’ve got solutions ready that they’ve been thinking about to tackle some of the problems that you’ve highlighted through your analytics initiatives.

That’s where the data scientists then comes in and says, “Okay, I can see we have a problem with attribute search. Let me try to design a query classification strategy to solve for this.” And that’s where all the personas involved in the strike team can assist that data scientist in understanding the business problem, understanding the technical limitations and capabilities that they’re searching for, and so that the solution can be designed and implemented accordingly.

Now, moving into the objectives of these data science strike teams. Some of these items are kind of stolen straight off of the data science adopting chart that I showed earlier. But they should have objective objectives like creating a long pipeline of AI related initiatives to tackle. So this is where we when we get to that innovation level of sophistication of data science in your company, this is where that laundry list of AI related initiatives is important so that to continue to have the ball rolling and continue to innovate and iterate.

This pipeline should stay grounded in the analytics, right? So you should always have information in the vein of the reports that I was showing earlier to drive your initiatives forward And having kind of data backed evidence for why you’re beginning an initiative. I’ll highlight that here because it’s easy to fall into a trap and stray away from these analytics. You might see a wicked video of a demo showing off some crazy machine learning technology. Always ask yourself when you see those kinds of videos, “how am I actually going to apply this to my product discovery experience?” The bells and whistles are definitely great to have, but if there’s no practical application for them that they’re not much help for you.

These teams again should always be iterating on initiatives, not implementing a solution and then saying it’s good. Asking how can we make that solution that we implemented better, continuously improving in that manner, and experimenting with all different kinds of strategies across the product discovery experience.

Finally, these groups should also be focused on democratizing that data science mindset across the organization. Starting with all the avenues that connect the product discovery and then going beyond that to see what other areas of the business would a data science mindset be beneficial.

I’ll come back to the slide here to finish with an overview of the landscape of the clients that I engage with today. Most companies are falling in the two middle zones here with probably a slight bias towards the beginning half of this adoption journey.

If you jump the gun in your data science journey, and if you’re thinking right now, “hey, I’m not I’m not sure what analytics are driving our AI initiatives.” Definitely take a step back and go back to those analytics and then climb the mountain from there, because I think you’ll see that you’ll be iterating much faster. You’ll see much faster time to value with your AI initiatives and the value that you’re getting It will be much more concrete because it it is sourced from those analytics reports.

So, you know, think about your organizations and how you guys are sitting now. If you’re at the beginning of this journey, think about what kind of low hanging fruit can we start with to build some momentum. We can get accustomed to this kind of adoption strategy and these kinds of methodologies.

If you’re somewhere in the middle, think about what’s preventing you from climbing higher and higher to the operational AI peak. Or the never stopped phase where you’re just turning out AI initiatives, iterating on the currently implemented initiatives, think about what’s preventing you from getting to that kind of state of being.

We, of course, always invite you to come talk to us, but we may be officially labeled as software vendors. We’d rather you consider us as partners so we can come along with you in this product discovery journey and help you succeed in data science adoption as best we can.

I thank you for your attention. I hope you learned something today or you found something interesting. Please join me for a live Q and A discussion. You can find details of that in the chat.

Thanks.

Play Video