SolrCloud on Docker
This is a follow-up to my Solr on Docker post. For this one, we’ll use a standalone ZooKeeper node, and three SolrCloud nodes, all in their own Docker containers.
Docker version 0.7, build 0d078b6, on Ubuntu 13.04.
The current version of ZooKeeper is 3.4.5, and there is a docker-zookeeper project which runs that in a single-node configuration.
If we build and run that in an instance named “zookeeper”:
cd ~ mkdir zookeeper-docker cd zookeeper-docker wget https://raw.github.com/jplock/docker-zookeeper/master/Dockerfile docker build -t makuk66/zookeeper:3.4.5 . ... Successfully built 26871fd90d0c docker run -name zookeeper -p 2181 -p 2888 -p 3888 makuk66/zookeeper:3.4.5
We see that ZooKeeper starts running, and after a few seconds we can verify it’s happy:
$ echo ruok | nc -q 2 localhost `docker port zookeeper 2181|sed 's/.*://'`; echo imok
SolrCloud: Distributed Solr
The current version of Solr is 4.6.0, so we download that:
cd ~ mkdir solr-docker cd solr-docker wget http://www.mirrorservice.org/sites/ftp.apache.org/lucene/solr/4.6.0/solr-4.6.0.tgz
This locally cached copy will get added to Docker container at build time.
Create a Docker file:
cat > Dockerfile <<'EOM' # # VERSION 0.2 FROM ubuntu MAINTAINER Martijn Koster "firstname.lastname@example.org" ENV SOLR solr-4.6.0 RUN mkdir -p /opt ADD $SOLR.tgz /opt/$SOLR.tgz RUN tar -C /opt --extract --file /opt/$SOLR.tgz RUN ln -s /opt/$SOLR /opt/solr RUN apt-get update RUN apt-get --yes install openjdk-6-jdk EXPOSE 8983 CMD ["/bin/bash", "-c", "cd /opt/solr/example; java -jar start.jar"] EOM
docker build -rm=true -t makuk66/solr4:4.6.0 .
where makuk66 is my username; substitute your own.
If you don’t want to build your own image, you can pull makuk66/docker-solr, and use
makuk66/docker-solr instead of
Now we’ll manually run this with docker in the foreground.
The first node bootstraps the collection (like the SolrCloud Example A):
docker run -link zookeeper:ZK -i -p 8983 -t makuk66/solr4:4.6.0 /bin/bash -c 'cd /opt/solr/example; java -Dbootstrap_confdir=./solr/collection1/conf -Dcollection.configName=myconf -DzkHost=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT -DnumShards=2 -jar start.jar'
-link zookeeper:ZK makes the network information from the node named “zookeeper”
available as environment variables with the ZK_ prefix.
and then the other two start like:
docker run -link zookeeper:ZK -i -p 8983 -t makuk66/solr4:4.6.0 /bin/bash -c 'cd /opt/solr/example; java -DzkHost=$ZK_PORT_2181_TCP_ADDR:$ZK_PORT_2181_TCP_PORT -jar start.jar'
To show all the running containers:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 1cac635ec128 makuk66/solr4:4.6.0 /bin/bash -c cd /opt 3 seconds ago Up 2 seconds 0.0.0.0:49158->8983/tcp prickly_mccarthy bd23d3891dd6 makuk66/solr4:4.6.0 /bin/bash -c cd /opt 5 seconds ago Up 4 seconds 0.0.0.0:49157->8983/tcp high_albattani 365a17a69176 makuk66/solr4:4.6.0 /bin/bash -c cd /opt About a minute ago Up About a minute 0.0.0.0:49156->8983/tcp elegant_bardeen 13805a493a79 makuk66/zookeeper:3.4.5 /opt/zookeeper-3.4.5 25 minutes ago Up 25 minutes 0.0.0.0:49153->2181/tcp, 0.0.0.0:49154->2888/tcp, 0.0.0.0:49155->3888/tcp elegant_bardeen/ZK,high_albattani/ZK,prickly_mccarthy/ZK,zookeeper
We can now use one of the exposed ports to look at Solr:
which shows the 3 Solr nodes in the cluster running on their own internal IP addresses. Neat.
Of course we won’t believe it’s real unless we see search in action.
So let’s run another docker instance to load some data, using the docker host port for one of the nodes above:
docker run -link zookeeper:ZK -i -t makuk66/solr4:4.6.0 /bin/bash cd /opt/solr/example/exampledocs java -Durl=http://192.168.0.221:49158/solr/update -jar post.jar *.xml
apt-get install wget wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=solr&wt=xml'
you can do the same directly to the internal address, which you can find using
docker inspect prickly_mccarthy wget -O - 'http://172.17.0.37:8983/solr/collection1/select?q=solr&wt=xml'
You can see the shards in action by comparing:
wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=*:*&wt=xml' | sed 's/.*numFound="//' | sed 's/".*//' 32 wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=*:*&wt=xml&shards=shard1' | sed 's/.*numFound="//' | sed 's/".*//' 14 wget -O - 'http://192.168.0.221:49158/solr/collection1/select?q=*:*&wt=xml&shards=shard2' | sed 's/.*numFound="//' | sed 's/".*//' 18
Also interesting to try is:
docker diff prickly_mccarthy
to see what changes were made to the filesystem.
We can do a bunch of further polish here:
- we should be able to create images rather than specify command lines
- to allow multiple clusters to co-exist on a single Docker host, we should use something more dynamic than a ‘ZK’ prefix
- it’d be nice if we had a single script that deployed a whole cluster
- we should probably use Data Volumes for index storage
- we may want supervisord/upstart to monitor Java to recover from crashes
- it might be nice to auto-discover the latest versions of ZooKeeper and Solr and use those
- if we register containers, we could consider pre-expanding the Solr
.war, for sartup speed and to reduce diffs
but those all depend a bit on use-case, and are for another day.
I can really see the value of this approach for certain use-cases.
The resource efficiency, startup speed and cleanliness makes it ideal for proof-of-concept deployments, A/B testing,
and for application developers to use as a local sandbox.
I’m intrigued about production use-cases for this kind of setup. It’s obviously suitable for
multi-tenant deployments, and I’d interested in how you could setup a SolrCloud deployment
across multiple Docker hosts.
Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.