Replication has always been one of Solr’s cooler features, but its been hampered by the Unix features it employs. Unix scripts mixed in with run (almost) anywhere Java is enough to make anyone sigh. Users of Solr on Windows have been somewhat left out in the cold. That’s all changing though, because Solr 1.4 will bring a new, built-in, replication feature that works as a Solr RequestHandler.
The authors of the new RequestHandler have posted some valuable info, benchmarking the new replication scheme with the older script based replication. New users looking to start with large indices’s are always looking for this type of info. Seeing how long it takes to transfer a 2 gig index on some fairly normal setup will give a gut feeling for the transfer time you can expect to see on your setup (keeping in mind that the whole index will not always be transferred unless you optimize first every time).
Looking at the graph to the left, you can see that the old style scripts method with rsync is a tad slower, but not enough to really matter. Its nice to know the new built-in replication is a small gain rather than a small loss though.
One thing missing from the given info is the specs of the systems/network that were used. We can play with some numbers and make some guesses.
There is another diagram on the SolrReplication page that gives us the exact numbers. Using 2100MB in 217 seconds, we can see that the index was moved at 9.68 MB per second using the new built-in replication method. That’s a bit over 7 minutes for 4 gigabytes. Is that a normal number or were they using RAID 0 Super Drives and 100 Gigabit networks?
Well we know that the two main bottlenecks are going to be the hard drive speed and the network speed. Here, it looks like one of the two topped out at almost 10 MB per second. Normal?
Lets start with the drive. We can look up what a first gen serial ATA drive can do on Wikipedia. 150 MB/s. Unfortunately, that’s the theoretical maximum speed of the bus. The maximum sustained transfer speed of the drive will actually fall far short. Too bad, because a 6 Gbps interface was just demo’d by Seagate (doubling SATA2). We are limited to the drives though, and after reading lots of random hearsay, it looks like you can expect about 25-30 MB/s on a 5200rpm laptop drive, and about 50-60 MB/s on a standard 7200rpm drive (sustained transfer rates). Or you can jump on the high end and get something like these raptor drives that claim sustained transfer rates over 100 MB/s. That doesn’t look like our bottleneck. 60 MB/s is 4 gigabytes in about a minute, 8 seconds or a gig in 17 seconds.
So on to the network. First thing to look at is prob the speed of your standard 100base-X (Fast Ethernet). That’s a theoretical maximum of 12.5 MB/s (according to Wikipedia), which translates to about 9-10 MB/s real world based on a few google searches. That looks like our bottleneck. Moving up the line we have 1000base-X (Gigabit Ethernet) with a theoretical max of 125 MB/s and an apparent real world of anywhere from 30 to 60 MB/s. In the wireless world, 802.11b appears to have a real-world max of about 0.5 MB/s, 802.11g : 2.5 MB/s, and 802.11n : 9.3 MB/s.