Milvus: Billionth-scale Similarity Search in Milliseconds

Milvus is the world’s most advanced open source vector database and similarity search engine, used by over 1,000 of the “”who’s-who”” organizations around the world including Lucidworks! Developed as a cloud native scalable architecture that supports storage and search of billions of vectors, the core is built on top of state of the art ANN algorithm libraries such as FAISS, HNSW and Annoy. The session will showcase the power of Milvus along with highlighting the SDK availability for Python, JAVA, GO, and restful.

Intended Audience

Developers, IT Management, Software Developers

Attendee Takeaway

Attendees will walk away with a thorough understanding of how vector databases can accelerate similarity search by 10x and also be utilized by organizations in multiple industries to query billionth-scale datasets in milliseconds.

Attendees will learn about mainstream use-cases where Milvus is deployed in production, and learn how to take advantage of the software for their applications.

Speaker

Ryan Chan, Zilliz


[Ryan Chan]

Hi, good morning, everyone. Thanks for coming out to Activate. 

Today, we’ll be talking a bit about vector searching, vector databases, some mid-level overview about how we implement our distributed vector database system and how you can use this technology to power the next generation of your search at a truly massive scale. 

First some quick introductions. My name is Ryan Chan, I’m a data engineer on the user success team as Zilliz. My focus is on system deployments, especially in public cloud systems. 

So Zilliz, who are we? Well, we’re an open source software company based out of Shanghai. Our mission is to reinvent data science, which we do by publishing and contributing to a number of open source projects. 

The one of relevance today is Milvus, our open-source vector database that we’ll be going into a little later in the presentation. Before we get into Milvus, a quick introduction to vector search. Traditional search models, work best with numbers and texts, things that we can think of as structured data. Structured data being data with an inherent structure to it, you know, easily parsable and not much variation in form. 

We have applications that work quite well for those use cases. I’m sure many of us have worked with Lucene-based systems or something like Solr before. They do all right for their use cases, namely text and document searching. What if we want to move beyond that? Nowadays, a huge amount of data ingested by companies is what we call unstructured data, data like images, videos, biometric data, pretty much anything that’s not just text or numbers. 

Our traditional tech searching strategies aren’t quite as useful here. Instead, what we can do is use machine learning models to generate what we call embedding vectors, vector representations of each piece of data, and conduct similarity searches using these. 

So how do we actually conduct these vector searches? Well, there are a number of open-source vector search frameworks out there that we’ve tried, but ultimately we found that for nearly every single application, we need more than just a stock algorithmic libraries. After all, once you have your vectors you need to store, manipulate, and if you want to work at a truly massive data scale, you need to build an entire system around scalability. 

So seeing all these very common requirements we built Milvus, which is what we like to call a vector database, we combined vector searching with embedding storage and retrieval all designed in a distributed fashion for scalability and robustness. 

So here we have a diagram showing kind of an example flow of a very generalized machine learning pipeline. For the most part, it’s very textbook stuff. First you build and train your machine learning model to generate your embedding factors. Next, you want to build up the vectors to store in your vector database so you can query against them. Now there’s a few different ways to go about doing this, but the most straightforward way is to simply run your model on your entire corpus of data that you want to make searchable. Then when you get your input data in production, for example, search query from a user, you can run it through the same model, then use your vector database to efficiently perform an approximate nearest neighbor search and get your results. Just do some post processing. Then you’re done. 

Now I’ve talked about the general cases of vector databases, but our own vector database project Milvus has some extra features in it that we like to think are a little above the baseline. Closer to the hardware side of things, we’re proud of our support for heterogeneous computing, including support for many different types of hardware and instruction sets. 

With regards to data management, Milvus manages massive data through partitioning and data sharding, allowing for impressive performance on even the largest datasets. We adopt and improve algorithms such as Faiss, Annoy, and HNSW to deliver efficient searching. 

Now cloud and scalability in search. I’ve touched on this scalability concept a bit so far, but why is it so important? Well, there are a few obvious answers. When you build a distributed system, you get better elasticity and resilience. You can disaggregate and manage the different parts in your system at will. Properly deployed at cloud platforms, you can save money on operating costs when loads are low, while being able to scale it right back up automatically when things pick up. 

Additionally and possibly even more importantly, is the ability to scale for new growth. With new influxes of high amounts of data and the advent of so many new machine learning techniques, we predict the search loads will grow by an extreme amount when moving into this new generation of vector-based searches. Building systems which can efficiently handle these massive loads is critical. 

So what exactly does scalability look like in Milvus? The deployment to Kubernetes and Helm there’s configuration options available to scale for any cluster size. On the storage side of things we’ve built the system from the ground up for S3 storage. For deployments, not attached to S3 buckets we bundle in a storage solution called MinIO, which is an object storage server that you can run that shares the same API boundary as S3. It can kind of be treated as a drop in replacement if you’d like to host your own storage instead of using S3. 

Additionally we have support for storage using Azure Blob Storage and Google Cloud Storage, if you’re interested in running things on a different public cloud. We disaggregate our storage and computation and allow for separated read, write, background services. We do this through our distributed system design. 

So let’s talk a little bit about how we accomplish things in Milvus, specifically our distributed architecture. I won’t be getting too deep into technical details here, but as a mid-level overview, the main idea is that we disaggregate as many components of our system as possible into individually scalable services. For example, our data insertion, indexing, search querying, all happen in different scalable components. 

You can see some of them in the diagram on the screen. We have a total of eight different types of services or nodes. We have proxy nodes, root coordinator nodes, query coordinator nodes, data coordinator nodes, index coordinator nodes, query worker nodes, data worker nodes, finally, index worker nodes. 

So in this way, you can run, Milvus on a single machine or on a whole cluster in the cloud using the same exact system. So the system backbone, which holds together all the disparate parts of our system is our log sequence. We use a distributed log on a pub-sub system to handle data movement and node communication in such a way as to leave individual nodes as stateless as possible. By having all of our data as log, we can guarantee data durability, allow for fast failure recovery, and make the system easily extendable. 

All right, let’s get into a few details about the different types of nodes. First off is the access layer, which is what you might be able to guess. Users interact with the system through proxy nodes, which manage messages, ingestion and routing. Schema changes in other database metadata updates are sent to the coordinator nodes, all data management messages, put to the log and are consumed by worker nodes. As you can see circled in the red box on the screen, if you’re interested in looking at that. 

Next step are the coordinator services. First we have the root coordinator, which handles the aforementioned database metadata requests. The rest of the coordinator nodes, all manage worker nodes that handle the workloads for their respective tasks. We have data coordinator nodes which track and trigger background data operations, such as flushing, and data compaction after insertion. They also handle metadata for the actual inserted vector data. We have query coordinator nodes, which manage the load balancing for query nodes. And finally, we have index coordinator nodes, which manage and assign index building tasks while also handling index metadata. 

So for each of the data query and index coordinators, there is a respectful collection of worker nodes. These worker nodes are all stateless and are the ones that actually perform the tasks as directed by the coordinators. Data nodes retrieve incremental log data from the log, pack and store that data, and process mutation requests. Index nodes build indices on insert data, pretty straightforward. Query nodes load indices and data from object storage and run searches and queries on that. 

And finally, we have the storage layer of Milvus. Milvus uses a number of open-source technologies to handle data movement and storage. Our log broker makes use of Pulsar, which allows the system streaming data persistence, event notification and reliable asynchronous queries. Metadata storage, and search discovery is handled by ETCD, a resilient, distributed key value store. 

Finally, object storage is handled with miniO or S3 where we store a snapshot files of logs, index files and intermediate query results. When you’re deploying through Helm, these components will automatically be set up and deployed. However, if you already have something like an existing Pulsar cluster that you’d like to use, Milvus is capable of routing through that. So now let’s take a look at a few example use cases of vector searching technology. 

The first example is the Smart Housing Search and recommendation system. There are a couple of different companies that we’ve run into that are using Milvus for this application already. So we’ll talk about the example of Leju which is the Chinese equivalent of Zillow. They’re using Milvus to speed up and expand their housing search software. So this is accomplished by first extracting four collections feature vectors from the input information using machine learning models. This input information can be things like floor plans, area, outline, house orientation, any other house details that will be included as part of a listing. 

Next using these four collections, Milvus performs a similarity search to find the closest results per collection on the inputted house parameters. Finally, the results of the similarity searches are congragated and the closest houses are returned as the recommended properties. One of the more common uses for vector search technology is reverse image searching, that is searching for similar images, given an input range. This is very much the textbook use of the vector searching. If you ever see an example of vector searching, this is probably an example you’re going to see. We found that reverse image product far from just being a testing system is used a lot by many different companies, often use Milvus to do so. 

So in this case, the user used two neural nets to pull out an embedding. First neural net was a tuned Yolo net for object detection. And the next was a ResNet for image embedding. With these image embeddings in Milvus a similarity search can be performed on aged images to find the closest matches. 

So another interesting use case for Milvus is music recommendations. The first step for encoding a song is to separate the background music and the vocals. This is done using techniques such as audio inversion. The next step is to get embeddings built up from the background music. Background music is used to having more information about the song and to avoid misclustering due to things like cover variations have different vocals for songs. So with these embeddings stored in Milvus, we can search the current song the user is listening to to find similar songs. In order to avoid having songs that are too familiar, you can sort the results backwards or otherwise do some post-processing to give a similar, but not the same song. Now this can be used both as a recommendation system or as a search system. There’s quite a bit of overlap between the two, as you may have figured from these examples so far. 

And finally, we have one last use case that some of you might have some experience with, which is retail product searching. However, instead of using a traditional direct text search approach, we’ll talk about how better results can be achieved using vector searching. 

So in this example, we have Tokopedia, which is one of the largest e-commerce services in Indonesia. They needed an updated a search engine. They were using Elasticsearch to match raw key words within product names and descriptions. And this worked for them for finding documents that use the same words repeatedly, but it didn’t always mean that they were actually similar. So the new method is to generate word embedding vectors, using the product names and the descriptions. Word embeddings replace words with vectors that map to areas of similarity between words. These word embeddings can then be put into Mulvis and a similarity search performed to provide, to find products that relate. 

All right, that just about wraps up our presentation. Be sure to join our Slack, star us on Github, check out our other social media if you have any questions or interested in finding out more. We’re open source, you can find us on Github. We always welcome external contributors. If you’re interested in joining the discussion, interested to hearing more about how technical design is going, interested in submitting issues, we do like issues. As strange as that sounds, please go ahead, just make the issue, join us on Slack, anything. 

Anyways, thank you all for tuning in and have a nice day and enjoy the rest of the presentations.

Play Video