Blog, Open Source, SearchHub, Technical Article, Tutorials and Documentation

Solr 4.3: Shard Splitting – A Quick Look

by Lucidworks
May 20, 2013

Thanks to Rafał Kuć of Solr.pl for this post.

With the release of Solr 4.3 we’ve got a long awaited feature – we can now split shards of collections that were already created and have data (in SolrCloud type deployment). In this entry we would like to try that feature and see how it works.

So let’s do it.

A few words before we try

Choosing the right number of shards a collection should have is one of those variables that needs to be known before the final deployment. Previously, after a collection was created, we couldn’t change the number of shards, we were only able to add more replicas. Of course that came with consequences – if we’ve chosen the wrong number of shards we could end up with too few shards and the only way to go was creating a new collection with the proper amount of shards and then re-indexing our data. With the release of Apache Solr 4.3 we are now able to split the shards in our collections.

Small cluster

In order to test the new shard splitting functionality I decided to run a small and simple cluster containing a single Solr instance with the embedded ZooKeeper and use the example collection provided with Solr. In order to achieve that I’ve run the following command:

java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=collection1 -DzkRun -DnumShards=1 -DmaxShardsPerNode=2 -DreplicationFactor=1 -jar start.jar

After launching the mini cluster its view was as follows:

Test data

As usual we need some data for tests and I decided to use the example data provided with Solr. In order to index them I’ve run the following command in the exampledocs directory:

java -jar post.jar *.xml

The number of indexed documents were checked with the following command:

curl 'http://localhost:8983/solr/collection1/select?q=*:*&amp;rows=0'

The response returned by Solr was as follows:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;response&gt;
&lt;lst name=&quot;responseHeader&quot;&gt;
  &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
  &lt;int name=&quot;QTime&quot;&gt;5&lt;/int&gt;
  &lt;lst name=&quot;params&quot;&gt;
    &lt;str name=&quot;q&quot;&gt;*:*&lt;/str&gt;
    &lt;str name=&quot;rows&quot;&gt;0&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name=&quot;response&quot; numFound=&quot;32&quot; start=&quot;0&quot;&gt;
&lt;/result&gt;
&lt;/response&gt;

As you can see we’ve got 32 documents in our collection.

Shard split

So now let’s try to divide the single shard that makes up our collection. In order to do that we will use the Collections API and a new – SPLITSHARD action. In its simplest form it takes two parameters – collection which is the collection name we want to divide and shard which is the name of the shard we want to split. So in our case, the command that will split the shard looks like this:

curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARD&amp;collection=collection1&amp;shard=shard1'

If everything runs without any problems, after a few seconds we will get a response from Solr that indicates the end of the process. The response will look more or less like this:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;response&gt;
&lt;lst name=&quot;responseHeader&quot;&gt;
  &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
  &lt;int name=&quot;QTime&quot;&gt;9220&lt;/int&gt;
&lt;/lst&gt;
&lt;lst name=&quot;success&quot;&gt;
  &lt;lst&gt;
    &lt;lst name=&quot;responseHeader&quot;&gt;
      &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
      &lt;int name=&quot;QTime&quot;&gt;6963&lt;/int&gt;
    &lt;/lst&gt;
    &lt;str name=&quot;core&quot;&gt;collection1_shard1_1_replica1&lt;/str&gt;
    &lt;str name=&quot;saved&quot;&gt;/home/solr/4.3/solr/solr.xml&lt;/str&gt;
  &lt;/lst&gt;
  &lt;lst&gt;
    &lt;lst name=&quot;responseHeader&quot;&gt;
      &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
      &lt;int name=&quot;QTime&quot;&gt;6977&lt;/int&gt;
    &lt;/lst&gt;
    &lt;str name=&quot;core&quot;&gt;collection1_shard1_0_replica1&lt;/str&gt;
    &lt;str name=&quot;saved&quot;&gt;/home/solr/4.3/solr/solr.xml&lt;/str&gt;
  &lt;/lst&gt;
  &lt;lst&gt;
    &lt;lst name=&quot;responseHeader&quot;&gt;
      &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
      &lt;int name=&quot;QTime&quot;&gt;9005&lt;/int&gt;
    &lt;/lst&gt;
  &lt;/lst&gt;
  &lt;lst&gt;
    &lt;lst name=&quot;responseHeader&quot;&gt;
      &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
      &lt;int name=&quot;QTime&quot;&gt;9006&lt;/int&gt;
    &lt;/lst&gt;
  &lt;/lst&gt;
  &lt;lst&gt;
    &lt;lst name=&quot;responseHeader&quot;&gt;
      &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
      &lt;int name=&quot;QTime&quot;&gt;103&lt;/int&gt;
    &lt;/lst&gt;
  &lt;/lst&gt;
  &lt;lst&gt;
    &lt;lst name=&quot;responseHeader&quot;&gt;
      &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
      &lt;int name=&quot;QTime&quot;&gt;1&lt;/int&gt;
    &lt;/lst&gt;
    &lt;str name=&quot;core&quot;&gt;collection1_shard1_1_replica1&lt;/str&gt;
    &lt;str name=&quot;status&quot;&gt;EMPTY_BUFFER&lt;/str&gt;
  &lt;/lst&gt;
  &lt;lst&gt;
    &lt;lst name=&quot;responseHeader&quot;&gt;
      &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
      &lt;int name=&quot;QTime&quot;&gt;1&lt;/int&gt;
    &lt;/lst&gt;
    &lt;str name=&quot;core&quot;&gt;collection1_shard1_0_replica1&lt;/str&gt;
    &lt;str name=&quot;status&quot;&gt;EMPTY_BUFFER&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;/response&gt;

Cluster after the split

After the split our cluster view will look like this:

As we can see, we now have two new shards. In theory each of the new shards should contain a portion of the documents from the original shard1 – some of the documents should be placed in shard1_1 and some in shard1_0. Again using Solr administration panel we can check each of the cores (which are the actual shards):

Shard1_1

Statistics for shard with the name of Shard1_1 are as follows:

Shard1_0

And the statistics for shard with the name of Shard1_0 are as follows:

As you can see we have 32 documents in total, which is the same as in the original collection.

Cleaning up

I’ve left the cleaning up for the end. First of all, in order to see the data in new shards we need to run the commit command against our collection. For example, this can be done by using the following command:

curl 'http://localhost:8983/solr/collection1/update' --data-binary '&lt;commit/&gt;' -H 'Content-type:application/xml'

In addition to that we can also remove the original shard, for example by using Solr administration panel or by using the CoreAPI.

Final test

As a summary I decided to test if the documents are available in the shards created by the SPLITSHARD action. In order to do that I’ve used the following command:

curl 'http://localhost:8983/solr/collection1/select?q=*:*&amp;rows=100&amp;fl=id,[shard]&amp;indent=true'

And Solr responded in the following way:

&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
&lt;response&gt;
&lt;lst name=&quot;responseHeader&quot;&gt;
  &lt;int name=&quot;status&quot;&gt;0&lt;/int&gt;
  &lt;int name=&quot;QTime&quot;&gt;7&lt;/int&gt;
  &lt;lst name=&quot;params&quot;&gt;
    &lt;str name=&quot;fl&quot;&gt;id,[shard]&lt;/str&gt;
    &lt;str name=&quot;q&quot;&gt;*:*&lt;/str&gt;
    &lt;str name=&quot;rows&quot;&gt;100&lt;/str&gt;
  &lt;/lst&gt;
&lt;/lst&gt;
&lt;result name=&quot;response&quot; numFound=&quot;32&quot; start=&quot;0&quot; maxScore=&quot;1.0&quot;&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;GB18030TEST&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;IW-02&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;MA147LL/A&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;adata&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;asus&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;belkin&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;maxtor&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;TWINX2048-3200PRO&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;VS1GB400C3&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;VDBDB1A16&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;USD&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;GBP&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;3007WFP&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;EN7800GTX/2DHTV/256M&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_0_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;SP2514N&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;6H500F0&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;F8V7067-APL-KIT&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;apple&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;ati&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;canon&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;corsair&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;dell&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;samsung&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;viewsonic&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;EUR&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;NOK&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;VA902B&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;0579B002&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;9885A004&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;SOLR1000&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;UTF8TEST&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
  &lt;doc&gt;
    &lt;str name=&quot;id&quot;&gt;100-435805&lt;/str&gt;
    &lt;str name=&quot;[shard]&quot;&gt;192.168.56.1:8983/solr/collection1_shard1_1_replica1/&lt;/str&gt;&lt;/doc&gt;
&lt;/result&gt;
&lt;/response&gt;

As you can see the documents came from both shards, which is again what we expected. Please remember that this is only a sample usage and we will get back to the shard split topic for sure.

About Lucidworks

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.

Fusion Platform Overview

Fusion Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

Solr 4.3: Shard Splitting – A Quick Look

About Lucidworks

LEARN MORE

Fusion Platform Overview

Fusion Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

About Lucidworks

Related Articles

LEARN MORE