2015 was a banner year for our blog with fresh posts popping up constantly across a broad selection of topics from the brainiacs here at Lucidworks. Here’s our top ten most popular blog posts from the past year:
Ted Sullivan’s recap of Revolution 2015 starts off our countdown:
I just got back from Lucene/Solr Revolution 2015 in Austin on a big high. There were a lot of exciting talks at the conference this year, but one thing that was particularly exciting to me was the focus that I saw on search quality (accuracy and relevance), on the problem of inferring user intent from the queries, and of tracking user behavior and using that to improve relevancy and so on. … What was really cool to me was the different ways people are using to solve the same basic problem – what does the user want to find?
Anshum Gupta’s post outlining the best bits of the new Solr came in ninth:
The much anticipated Apache Lucene and Solr 5.0 was just released. It comes packed with tons of new features, stability improvements and bug fixes. A lot of effort has gone into making Solr more usable, mostly along the lines of introducing APIs and hiding implementation details for users who don’t need to know. Solr 4.10 was released with scripts to start, stop and restart Solr instance, 5.0 takes it further in terms of what can be done with those. The scripts now, for instance, copy a configset on collection creation so that the original isn’t changed. There’s also a script to index documents as well as the ability to delete collections in Solr. As an example, this is all you need to do to start SolrCloud, index lucidworks.com, browse through what’s been indexed, and clean up the collection.
Maritjn Koster’s walkthrough of running Solr in a Docker container:
It is now even easier to get started with Solr: you can run Solr on Docker with a single command: $ docker run –name my_solr -d -p 8983:8983 -t solr
Cassandra Targett’s post announcing our open source release of Hadoop connectors:
Lucidworks is happy to announce that several of our connectors for indexing content from Hadoop to Solr are now open source. We have six of them, with support for Spark, Hive, Pig, HBase, Storm and HDFS, all available in Github. All of them work with Solr 5.x, and include options for Kerberos-secured environments if required. Repo: https://github.com/LucidWorks/
Tim Potter’s primer on using Solr as an Apache Spark SQL DataSource:
The DataSource API provides a clean abstraction layer for Spark developers to read and write structured data from/to an external data source. In this first post, I cover how to read data from Solr into Spark. In the next post, I’ll cover how to write structured data from Spark into Solr.
Hoss’s walkthrough on facets and stats in Solr:
Solr has supported basic “Field Facets” for a very long time. Solr has also supported “Field Stats” over numeric fields for (almost) as long. But starting with Solr 5.0 (building off of the great work done to support Distributed Pivot Faceting in Solr) it will now be possible to compute Field Stats for each Constraint of a Pivot Facet. Today I’d like to explain what the heck that means, and how it might be useful to you.
Erik Hatcher’s guided tour of Solr 5’s new ‘bin/post’ utility:
This is the first in a three part series demonstrating how it’s possible to build a real application using just a few simple commands. The three parts to this are getting data into Solr using bin/post, visualizing search results: /browse and beyond, putting it together realistically: example/files – a concrete useful domain-specific example of bin/post and /browse.
Erick Erickson’s explainer post on using Solr’s Suggester:
How would you like to have your user type “energy”, and see suggestions like: Energa Gedania Gdansk, Energies of God, United States Secretary of Energy, Kinetic energy. The Solr/Lucene suggester component can make this happen quickly enough to satisfy very demanding situations. … There’s been a new suggester in town for a while, thanks to some incredible work by some of the Lucene committers. Along about Solr 4.7 or so support made it’s way into Solr so you could configure these in solrconfig.xml.
Tim Potter runs Apache Solr 5.2 through the Solr Scale Toolkit – comparing it to Solr 4.8.1 with astounding results – and it’s our runner-up post:
Using Solr 4.8.1 running in EC2, I was able to index 130M documents into a collection with 10 shards and replication factor of 2 in 3,727 seconds (~62 minutes) using ten r3.2xlarge instances; please refer to my previous blog post for specifics about the dataset. This equates to an average throughput of 34,881 docs/sec. Today, using the same dataset and configuration, with Solr 5.2.0, the job finished in 1,704 seconds (~28 minutes), which is an average 76,291 docs/sec. To rule out any anomalies, I reproduced these results several times while testing release candidates for 5.2. To be clear, the only notable difference between the two tests is a year of improvements to Lucene and Solr!
And at number one, the most popular post of 2015 was Noble Paul’s tutorial on securing Solr with 5.2’s new security API:
Until version 5.2, Solr did not include any specific security features. If you wanted to secure your Solr installation, you needed to use external tools and solutions which were proprietary and maybe not so well known by your organization. A security API was introduced in Solr 5.2 and Solr 5.3 will have full-featured authentication and authorization plugins that use Basic authentication and “permission rules” which are completely driven from ZooKeeper.