Solr Cloud document routing was released in Solr 4.1. This feature expanded upon the simple hash based routing that was available in Solr 4.0 by introducing a new document router called the compositeId router.
The compositeId router allows a shard key to be added to the unique document id. This shard key determines which shard the document will be indexed in. At query time, a shard key can be supplied that will limit the query to a specific shard.
There are two primary use cases for document routing.
Document routing can be used to achieve a more efficient multi-tenant environment. This can be done by making the tenant id the shard key, which would group all documents from the same tenant on the same shard.
At query time, the tenant id can then be supplied as the shard key to limit the query to the shard where the tenant’s data resides. This approach means that not every shard in the collection needs to be searched to search a single tenant’s data.
Certain Solr features such as grouping’s ngroups feature and joins require documents to be co-located in the same core or vm. For example to take advantage of the ngroups feature in grouping, documents need to be co-located by the grouping key. Document routing will do this automatically if the grouping key is used as the shard key.
Setting Up The CompositeId Router
The Solr Cloud compositeId router is used by default when a collection is created with the “numShards” parameter. If the numShards parameter is not supplied at collection creation time then the “implicit” document router is assigned to the collection. The implicit document router indexes documents to the shard that they are sent to. There are many interesting use cases for the implicit router but this blog talks mainly about the compositId router.
You can see which router is assigned to each collection in the clusterstate.json by checking the “router” attribute.
Shards Are Assigned to the 32 Bit Hash Space
When a Solr Cloud collection is created with the numShards parameter, Solr assigns each shard a range of the 32 bit hash space.
In clusterstate.json each shard has a range attribute which shows the range the shard has been assigned. The ranges are shown in hex. Below is the decimal translation of the hex ranges for a 4 shard collection:
Shard1 : 2147483648-3221225471
Shard2 : 3221225472-4294967295
Shard3 : 0-1073741823
Shard4 : 1073741824-2147483647
Simple Document Routing With a Document Id Only
Each document indexed in Solr Cloud must have a unique document id assigned to it. For example:
When presented with a document id the compositeId router calculates a 32 bit murmurhash3 for the id. The compositeId router then routes the document to the shard whose range includes the hash value for the document id.
Composite Id Document Routing
A shard key can be pre-pended to the unique document id to create a composite id. The composite id is formed with the following syntax:
The ! is the separator.
In a multi-tenant setup this might look like this:
When a shard key is provided, the compositeId router calculates the 32 bit hash for both the shard key and the document id.
Then it creates a composite 32 bit hash by taking 16 bits from the shard key’s hash and 16 bits from the document id’s hash.
The upper bits of the hash are taken from the shard key and the lower bits from the document id.
The compositeId router then routes the document to the shard whose range includes the hash value for the composite id.
The upper bits, which come from the shard key, will dictate which shard the document is placed in.
The lower bits of the hash, which come from the unique doc id, place the document within a 65536 slice of the shard.
This scenario allows tenants to be split into multiple shards if needed in the future through Shard Splitting which was introduced in Solr 4.3.
Spreading Tenants Across More Then One Shard
When a tenant is too large to fit on a single shard it can be spread across multiple shards be specifying the number of bits to use from the shard key.
The syntax for this is:
The /num is the number of bits from the shard key to use in the composite hash.
This will take 4 bits from the shard key and 28 bits from the unique doc id, spreading the tenant over 1/16th of the shards in the collection.
3 bits would spread the tenant over 1/8th of the collection.
2 bits would spread the tenant over 1/4th of the collection.
1 bit would spread the tenant over 1/2 the collection.
0 bits would spread the tenant across the entire collection.
At Query Time
At query time the parameter shard.keys can be used to limit the query to a specific shard or range of shards. You specify which shard to query using this syntax:
or multiple keys: