“Merrily, merrily, merrily, merrily, Search is but a Stream…”

The concept of streaming is a relatively new feature in Fusion/Solr; and as the term implies, is a something of a different beast from a more traditional synchronous request->response. For starters, when querying via streaming, a single response object is not returned, rather, a ‘stream’ of Tuple objects (each of which effectively represent a document result), which are parsed or read on an individual basis. Streaming may seem like a bit of an enigma on the surface, but once you understand the basics, I believe you’ll discover the extensive power of this API.

So moving on: We’ll be using the CloudSolrStream class, which is part of the solrj.io package included in SolrJ.  Before we get to the actual code, however, there are some caveats to consider.

  1. You must enable “DocValues” in order for the sorting process to run without exception.
  2. You cannot use streaming for fields where “MultiValues” is enabled.
  3. Given the nature of the way JavaScript behaves at runtime, in order to truly “stream” a response, you’ll need to run your process in a separate thread.  Here is a tutorial on multi-threaded JavaScripting using the Nashorn engine.

In the scope of this tutorial, we’ll stick to streaming specifically.  So, again, moving on:

The Code:

function (doc) {
    var HashMap = java.util.HashMap;
    var Map = java.util.Map;
    var Tuple = org.apache.solr.client.solrj.io.Tuple;
    var CloudSolrStream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;

    var e = java.lang.Exception;
    var cstream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;
    var props = java.util.Map;
    var zkHost = "localhost:9983"; 
    var collection = "fbo_test";  
    var cstream = null;

    var props = new HashMap();
    try {
      props.put("q", "*:*");
      props.put("qt", "/export");
      props.put("sort", "id asc");
      props.put("fl", "id");
      props.put("rows", "20");
      
      cstream = new CloudSolrStream(zkHost, collection, props);
       cstream.open();
        while(true) {
          
          var tuple = cstream.read();
          if(tuple.EOF) {
              logger.info("BREAK");
             break;
          }

          var fieldA =  tuple.getString("id");
          logger.info(fieldA);
        }


    } catch (e) {
       logger.error(e);
    }
    return doc;
}

Breaking it down

You’ll recognize some familiar declarations at the start:

 
    var HashMap = java.util.HashMap;
    var Map = java.util.Map;
    var Tuple = org.apache.solr.client.solrj.io.Tuple;
    var CloudSolrStream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;

These are the Java classes that I’ll be using (I.e. ‘imports’).  From there, I declare my local variables:

    var e = java.lang.Exception;
    var cstream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;
    var props = java.util.Map;
    var zkHost = "localhost:9983"; 
    var collection = "fbo_test";  
    var cstream = null;

Technically, you can declare these anywhere so long as they remain accessible in the scope. As I’ve progressed with using Nashorn JavaScript, I’ve made it a practice to make my declarations in this manner for the sake of transparency. It also makes for cleaner code, in my opinion.

So now with variables declared, we can get on with the heavy lifting.  One note here:  You’ll notice that ‘zkHost‘ and ‘collection‘ are in bold.  You will have to adapt these settings to your environment.

Next we’ll add the properties of our query to a HashMap:

    var props = new HashMap();
    
      props.put("q", "*:*");
      props.put("qt", "/export");
      props.put("sort", "id asc");
      props.put("fl", "id");
      props.put("rows", "20");

Note that we’re using the ‘export‘ handler rather than the ‘select‘ handler.   Two more things to note here:

  1. The ‘sort’ field is required for the streaming API, and the field being sorted on must have ‘DocValues‘ enabled.
  2. The fields listed in the ‘fl’ parameter must NOT be ‘MultiValue’ fields.

Now let’s go ahead and instantiate our CloudSolrStream object, open the stream, and parse out the result:

cstream = new CloudSolrStream(zkHost, collection, props);
       cstream.open();
    
        while(true) {
          
          var tuple = cstream.read();
          if(tuple.EOF) {
              logger.info("BREAK");
             break;
          }
          // process Tuple result here
          var fieldA =  tuple.getString("id");
          logger.info(fieldA);
        }

 

You’ll know right off if there’s an issue with a parameter or field when you attempt to call ‘open()’ on the stream. If there is a problem, you’ll see an IOException in your application. If you look in the solr.log output, you’ll see a more specific exception describing the issue.

As mentioned previously, the streaming client returns a array of Tuple objects, each of which represent a single result item produced by the query.

Final note:  It is important to decide prior to creating your collection whether you’ll want to use the streaming API against it.  Think about the fields you’ll want to be sorting on, and which fields will be ultimately returned.   You’ll want the schema of your collection to reflect the requirements of the Streaming API.

That’s all you need to get you started.   Enjoy search in the stream!