Solr 4.x has made amazing strides for enabling a distributed, highly scalable search architecture.  And of course SolrCloud plays center stage in these systems.  But the word “cloud” can give the wrong impression to new users.

I’m not suggesting we change the name, but just be aware of these misconceptions and occasionally dispel them when speaking or writing.

Two specific points of confusion I’ve seen:

1: (wrong!) SolrCloud is a version of Solr specifically for cloud computing environments such as Amazon EC2, Google’s cloud platform, or even Microsoft Windows Azure.  This false impression might also be helped by all the Big Data companies starting to bundle Solr.

Although you can certainly run Solr in those environments, it’s certainly not required.  Solr can run on just one local machine (common for developers), or a pair of local machines if you’d like to have failover.

Solr can also run inside of virtual machines, with some small performance penalty.

2: (wrong!) Solr is only useful if you want to process “distributed data”, otherwise you should use “Lucene Core”.

Although Solr can be used with distributed data, such as spidering web sites or integrating with Hadoop, it can also serve up data from local file systems, traditional databases, corporate Content Management Systems (CMS), etc.

It’s been exciting to see the steady march of Solr 4.x, accumulating new features in every dot release.