Presented at virtual Activate 2020. Recent advances in Deep Learning brings us the possibility to get improvements in almost any domain. Search Engines aren’t an exception. Semantic search, visual search, “zero results” queries, recommendations, chatbots etc. – this is just a shortlist of topics that can benefit from Deep Learning based algorithms. But more powerful methods are also more expensive, so they require addressing the variety of scalability challenges. In this talk, we will go through details of how we implement Deep Learning Search Engine at Lucidworks: what kind of techniques we use to train robust and efficient models as well as how we tackle scalability difficulties to get the best query time performance. We will also demo several use-cases of how we leverage semantic search capabilities to tackle such challenges as visual search and “zero results” queries in eCommerce.
Sava Kalbachou, AI Research Engineer, Lucidworks
Ian Pointer, Senior Data Engineer, Lucidworks
Engineers, Data Scientists, Product Owners and just Machine Learning enthusiasts who want to enrich their products with the DL-powered semantic search capabilities. Prerequisite knowledge isn’t necessary although might be useful to understand some concepts in deep.
You will learn how DL-based semantic search solutions can drastically improve the search experience for you and your users yet still being scalable and applicable in the production.
Ian: Hi, I’m Ian, Senior Data Engineer at Lucidworks.
Today Sava and I are going to talk to you about how we’re integrating a new dense vector search engine into Fusion 5.
Just to review some of our semantic search offerings that we have in Fusion 5 already. We’ve got multiple ways for you to enrich and surface your data using some of our new features, including our recommenders.
We have our user-item interaction-based recommenders. This includes our classical ALS system, and we also have a new BPR recommender that’s recently just been introduced in a Fusion 5.2. We have content-based recommenders, similar query recommenders and we are currently in active development on some session and image-based recommenders, which will be coming up in future releases.
Over on Smart Answers, this is our Deep Learning powered natural language search solution. It provides supervised training on question answer pairs or signals data. But we also provide a cold start solution if you don’t have any of that information immediately to hand.
We offer custom small and efficient Deep Learning Models. We also support larger rich models like BERT, and we also can use smart answers in a cross lingual fashion, which Sava will be showing off later in the talk. We’ll soon be adding short answer extraction to our smart answers feature.
Some of the technology that we show off in today’s demo will be used to implement image search. Let’s take a look at how some of that is implemented at the moment. In our current version of Fusion, 5.2. If you can see on the left, we have our usual standard store of Solr in the middle we have our recommenders and smart answers. Over on the right, we have our index and query pipelines.
The recommenders in smart answers today, they read all the information that they’re gonna use from solr, in the case of the recommenders, the recommenders generate their recommendations, and then write that information back into Solr, in different collections.
In the case of smart answers, they read the data in from Solr, train a Deep Learning model on that data. Then that train model is deployed into the Fusion cluster via Seldon core and registers with the ML service. Then on the pipeline side of things, index time for smart answers, we call out to the model, on the document, with the document and get an encoding from the model with that document. Then we compress that vector representation and the clusters and add them to fields in the document, which then gets indexed into Solr and a query time for smart answers.
What happens is that we have to go off to the model again, to encode and clusterize the query. That is called out by a GRPC to the ML service, which talks to the Seldon core deployment.
Once we have that information, we then get potential candidates from Solr, which gives us a list of documents, all of which have compressed candidate vectors, which were stored in the indexing phase.
We have to decompress those candidate vectors. We compute the vector similarity between those candidates and our incoming query. Then finally we ensemble the scores of the vector similarity and the seller scores and rerank, and then pass that down through to the rest of the pipeline.
For recommenders, it’s a bit easier. We just get other item IDs by item ID from Solr. For each of those items, we then go off to Solr and get the information for those items from Solr. Again, we pass that down the pipeline.
This approach works, but it does introduce some significant challenges, which we’ve had to try and overcome with our current versions of Fusion. Most of our problems come down from the issue that Solr itself does not have any native dense factor search support. That means that we have for the past couple of versions of Fusion, essentially rolled our own approach.
We have these compressed vectors, we decompress them and then we do all the similarity distance calculations, but we do it all within the Fusion query microservice. Everything is happening in Java. Everything is completely unaccelerated. As you can imagine, that means it’s a little slow.
It’s a problem for us in that if you want decent query time execution, we have to limit the amount of candidates that we retrieve from Solr. The potential matching that you’re getting, is going to be somewhat limited. It’s difficult for us to improve things like our cross lingual support or even add new features like image searching with this approach because we have to rewrite an awful lot from scratch. It’s pretty slow when we get there.
We’ve decided to take a different approach in upcoming versions of Fusion. What that means is that we are going to be adding a new part to our Fusion stack, which is Milvus. This is an open source Apache licensed product. It’s part of the Linux Foundations LF AI Project, which is designed to promote and incubate prominent open source, artificial intelligence, Deep Learning applications and solutions. Milvus itself is a highly scalable GPU enabled dense vector search system. Behind the scenes it uses performant vector search libraries, such as Facebook’s FAISS and NMSLib and Annoy. It comes out of the box with Java Python, Go C++ and REST APIs. It was fairly easy to programmatically integrate with Fusion. It’s also been developed with Kubernetes in mind.
Getting a very simple Milvus install integrated with Fusion is basically one line of Helm. A more resilient set up is a bit more than that, but it’s fairly easy to integrate with our current Helm charts. Also it comes out with a box with integrations and things like Prometheus and Grafana.
We’ll plug into most people’s existing Kubernetes infrastructure. Here is a slide of the internals of Milvus.
Very, very high overview. As you can see on the left, we can either insert objects or we can query, and we get back a top-K result from the API. If you look over at the processing engine, you can see that it’s, as I said, just on the previous slide that you can use FAISS or Annoy or other libraries for doing your processing, that does depend on what index type you set up within Milvus Metadata for the vectors is stored in a MySQL database, though the vectors themselves are stored in that storage tear in massive index files for the performance purposes.
It says it runs on various processes. I am a little skeptical on that because currently on the main webpage for Milvus it says it only runs on x86, but the important thing for our purposes is it runs on Intel hardware and it runs with GPU accelerated, that’s great for us. Out of the box it supports things like Euclidean distance, Jaccard distance, Hamming distance and inner product for similarity calculations, it can run at a scale of around a billion vectors, which is much better than say the 500 candidates that we’re getting back from Solr in our current implementation.
The other thing which is rather useful, and we will be making use of in further versions of Fusion is it can be used in a near real time capacity. It doesn’t have hard guarantees, but it aims to ensure that within one second of inserting a new vector into Milvus, you should be able to see its surface in queries.
That’s quite useful.
How does all this affect Fusion in the future? Let’s just go to the next slide and we will see the future Fusion 5 stack. On the left, we have Solr again, because Milvus has added itself into our stack roughly in the middle. Our recommenders have slightly changed now so they can run in an online fashion. As before they can read from Solr and perform operations on Southern collections.
Although we are also going to be adding cloud storage, read and write support in Fusion 5.3. You will be able to take Solr out of most of the loop all together, but anyway, for the recommenders, instead of writing back to Solr or cloud stories, you can see that they are using GRPC communications to write that output into Milvus this time, instead of writing back to a specific Solr collection or into cloud storage.
Over on smart answers, the story hasn’t changed very much. Data either comes from the Solr collections or from cloud storage. The model is trained on that data and it’s then deployed via Southern calling the ML service. Over on the index query pipeline.
However things have changed a little. For smart answers indexing. All we do now is when we are indexing a new document, we get the models encoding of that document. Then we write that into Milvus. Then at query time, we encode the incoming query and then we query Milvus itself for relevant document IDs. We use those document IDs to look up in Solr for pushing down to the rest of the pipeline.
As you can see at this point, we’re no longer obtaining candidates from Solr. We are actually querying against the entire embedding space, which hopefully improves our quality there. For recommenders it’s a little similar story to how we were working before, but instead of getting the similar item IDs from Solr, we’re now getting them from Milvus and from each item ID, we then look up in Solr for other metadata, which again gets passed down the pipeline.
Then finally we have our ensemble query where we can blend the results from Solr and Milvus together. That again, will be able to be configured on the stage. We’ll just re rank the results from that by the final score.
That’s how things are going to look into the future. I will hand over now to Sava. Who’s going to give you a quick demonstration of what that might look like in Fusion 5.3 and beyond.
Sava: Thank you, Ian. Hi everyone, my name is Sava.
Let me show you a few new cool examples that we can now support with smart answers and Milvus integration.
Let’s start with eCommerce. Ecommerce is very different from question answering or document search, Ecommerce queries are usually very short. There are tons of ways to describe the same things. Also, they are very noisy. There might be tons of misspellings and such problems. Another big problem that e-commerce is struggling with is zero search result queries. Usually about 5-10% of all queries are zero search results. In some cases this number can actually rise up to 40%.
Traditionally merchandisers have to manually create tons of rules to handle misspelled words, handle synonyms, different phrasing and so on.
For example, here we are searching the Best Buy product collection. On the right side of the screen, you can see results from Sematic search pipeline on the left side, it’s from a default Solr pipelines. It’s basically just two classical token matching storage. We are searching for “60TV” query. As you can see, Solr cannot return anything here because well, there is no token overlapping.
When Sematic search pipeline can understand the meaning of the query and returns different products that’s all about television with this site, but you might think, well, 60TV, it’s very easy to fix query.
You just need to tokenize it and add space, right?
Which is fair enough.
Let’s try another query and yes, indeed this time Solr is capable of returning something, but it’s not exactly what we are looking for. Basically it returned some TV stands. But well, we don’t have a TV yet. We’re actually searching for it.
Let’s try another example. For example, if I searched for a MacBook and I just type it with this space, Solr would return a lot of different. That isn’t related to exactly MacBook Pro laptops, whereas Sematic search pipeline would understand it resulting in query intent, classification or something and would return very relevant results.
But let me try to refine the query and just ask for Apple laptops.
In this case, Solr returns something more relevant, but yet it’s still not what we are looking for. It’s basically some accessories like laptop sleeves because yes, you can see there is a strong token overlap, for words, Apple and laptop, but the Symantec search pipeline still understands the meaning of what the user is looking for and suggests relevant results.
As you can see, it’s actually the same results as the MacBook query, which has basically different tokens.
Now let’s try a bit more complicated query. In this case, I’m searching for a battlefield game and as many gamers know, the abbreviation for it would be BF, BF2 for example. Solr would not return anything because obviously there is no abbreviation in the title.
The only way to fix it would be to provide some custom synonym list or something like that. But Sematic search pipeline can understand the meaning of this query and provide Battlefield games on top. Moreover it actually can recommend very similar products like Call of Duty games, which is a very similar franchise.
Indeed, if we start looking into visualization of our product vectors, we can see that a lot of small clusters are formed. If you start inspecting those clusters like this one is the middle. We can see that it will have products that are all about similar things, basically external hard drives, even from different brands like Western Digital, CG8, and Toshiba.
Let’s try another cluster. This one is more interesting because it actually has products from different departments, but still this vectorization can understand that these products are about similar things, basically how to learn to speak Spanish the cool thing is that we can actually do the same for queries.
This is a visualization of query vectors. For example, this cluster, as you can see, has all possible queries that describe a PlayStation move device, even with different spellings, with different abbreviations, with some misspellings, like mover, instead of move.
Let’s check another one, for example, this one, as you can see this small group is describing all laptops that support Dr. Dre Beats Audio System. As you can see, even such queries like Dre labtop, which is misspelled and best audio labtops, also misspelled. Still they don’t have any overlapping except the misspelled word.
Still, it can understand that Dr. Dre Beats are all about the same thing, but eCommerce isn’t the only use case that can benefit from this Milvus integration. Classical question/answering or FAQ can get a lot from it as well.
For example, here we have United Airlines FAQ, and we are searching for a very simple query, like hand luggage restrictions. Yet Solr returns, only three documents that aren’t related to this query told just because there is no token overlapping, but because we don’t need any more, we are not relying on Solr anymore to get candidates for selection, we can directly search in this vector space and find very relevant results, even with different wording.
As you can see, the FAQ uses carry on instead of hand, baggage, instead of luggage and limits instead of restrictions and things start to be even more interesting if we actually search in a different language, for example, in Japanese. We just searched for a very similar thing here. Obviously Solr would not return anything because, well, there is no Japanese content in our collection, which is true, but the models that we are using here were jointly trained on different languages.
It can handle interactions between languages out of the box, resulting in translation needed and still provide very useful and relevant results.
I think that’s all from us. Thank you everyone for attending our talk today.