Blog, Lucidworks Fusion, Technical Article, Tutorials and Documentation

Fusion Working for You: Streaming in a JavaScript Stage

by Kevin Cowan
February 3, 2017

“Merrily, merrily, merrily, merrily, Search is but a Stream…”

The concept of streaming is a relatively new feature in Fusion/Solr; and as the term implies, is a something of a different beast from a more traditional synchronous request->response. For starters, when querying via streaming, a single response object is not returned, rather, a ‘stream’ of Tuple objects (each of which effectively represent a document result), which are parsed or read on an individual basis. Streaming may seem like a bit of an enigma on the surface, but once you understand the basics, I believe you’ll discover the extensive power of this API.

So moving on: We’ll be using the CloudSolrStream class, which is part of the solrj.io package included in SolrJ. Before we get to the actual code, however, there are some caveats to consider.

You must enable “DocValues” in order for the sorting process to run without exception.
You cannot use streaming for fields where “MultiValues” is enabled.
Given the nature of the way JavaScript behaves at runtime, in order to truly “stream” a response, you’ll need to run your process in a separate thread. Here is a tutorial on multi-threaded JavaScripting using the Nashorn engine.

In the scope of this tutorial, we’ll stick to streaming specifically. So, again, moving on:

The Code:

function (doc) {
    var HashMap = java.util.HashMap;
    var Map = java.util.Map;
    var Tuple = org.apache.solr.client.solrj.io.Tuple;
    var CloudSolrStream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;

    var e = java.lang.Exception;
    var cstream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;
    var props = java.util.Map;
    var zkHost = "localhost:9983"; 
    var collection = "fbo_test";  
    var cstream = null;

    var props = new HashMap();
    try {
      props.put("q", "*:*");
      props.put("qt", "/export");
      props.put("sort", "id asc");
      props.put("fl", "id");
      props.put("rows", "20");
      
      cstream = new CloudSolrStream(zkHost, collection, props);
       cstream.open();
        while(true) {
          
          var tuple = cstream.read();
          if(tuple.EOF) {
              logger.info("BREAK");
             break;
          }

          var fieldA =  tuple.getString("id");
          logger.info(fieldA);
        }


    } catch (e) {
       logger.error(e);
    }
    return doc;
}

Breaking it down

You’ll recognize some familiar declarations at the start:

 
    var HashMap = java.util.HashMap;
    var Map = java.util.Map;
    var Tuple = org.apache.solr.client.solrj.io.Tuple;
    var CloudSolrStream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;

These are the Java classes that I’ll be using (I.e. ‘imports’). From there, I declare my local variables:

    var e = java.lang.Exception;
    var cstream = org.apache.solr.client.solrj.io.stream.CloudSolrStream;
    var props = java.util.Map;
    var zkHost = "localhost:9983"; 
    var collection = "fbo_test";  
    var cstream = null;

Technically, you can declare these anywhere so long as they remain accessible in the scope. As I’ve progressed with using Nashorn JavaScript, I’ve made it a practice to make my declarations in this manner for the sake of transparency. It also makes for cleaner code, in my opinion.

So now with variables declared, we can get on with the heavy lifting. One note here: You’ll notice that ‘zkHost‘ and ‘collection‘ are in bold. You will have to adapt these settings to your environment.

Next we’ll add the properties of our query to a HashMap:

    var props = new HashMap();
    
      props.put("q", "*:*");
      props.put("qt", "/export");
      props.put("sort", "id asc");
      props.put("fl", "id");
      props.put("rows", "20");

Note that we’re using the ‘export‘ handler rather than the ‘select‘ handler. Two more things to note here:

The ‘sort’ field is required for the streaming API, and the field being sorted on must have ‘DocValues’ enabled.
The fields listed in the ‘fl’ parameter must NOT be ‘MultiValue’ fields.

Now let’s go ahead and instantiate our CloudSolrStream object, open the stream, and parse out the result:

cstream = new CloudSolrStream(zkHost, collection, props);
       cstream.open();
    
        while(true) {
          
          var tuple = cstream.read();
          if(tuple.EOF) {
              logger.info("BREAK");
             break;
          }
          // process Tuple result here
          var fieldA =  tuple.getString("id");
          logger.info(fieldA);
        }

You’ll know right off if there’s an issue with a parameter or field when you attempt to call ‘open()’ on the stream. If there is a problem, you’ll see an IOException in your application. If you look in the solr.log output, you’ll see a more specific exception describing the issue.

As mentioned previously, the streaming client returns a array of Tuple objects, each of which represent a single result item produced by the query.

Final note: It is important to decide prior to creating your collection whether you’ll want to use the streaming API against it. Think about the fields you’ll want to be sorting on, and which fields will be ultimately returned. You’ll want the schema of your collection to reflect the requirements of the Streaming API.

That’s all you need to get you started. Enjoy search in the stream!

JavaScript SolrStreamClient Example. | Java Version

About Kevin Cowan

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.

Fusion Platform Overview

Fusion Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

Fusion Working for You: Streaming in a JavaScript Stage

The Code:

Breaking it down

About Kevin Cowan

LEARN MORE

Fusion Platform Overview

Fusion Platform Pricing

AI Hub

Lucidworks Features and capabilities (all Included)

Product Discovery

Searchandising

Site Search

Workplace Search

Ingest Data and Capture Signals

Employee Search Experience

Customer Service and Case Resolution

AI and Large Language Models

Solutions

Commerce

Customer Service

Knowledge Management

Industries

Retail

Government and Public Sector

Healthcare

B2B Commerce and Distribution

B2B Manufacturing

Financial Services

EXPLORE OUR CONTENT

Ebooks & Reports

Blog

Videos

Press

Resources

About Lucidworks

Documentation

Careers

LucidAcademy

Contact Us

Technical Support

The Code:

Breaking it down

About Kevin Cowan

Related Articles

LEARN MORE