Solr already supports update requests in JSON format. But it supports only Solr json format and not your own custom JSON. Now (with SOLR-6304 , version 4.10 onwards ), Solr supports any JSON document and the document can be indexed in the required format in Solr.

Transforming and Indexing custom JSON data

The objective of this feature is to help users index any JSON into a valid Solr document according to the users preference. It lets the user to split a single JSON file into 1 or more Solr documents. The final indexed document can be controlled using the mapping passed along the request . One or more valid JSON documents can be sent to the /update/json/docs path with the configuration params.

Mapping params

 

  • split : This parameter is required if you wish to transform the input JSON . This is the path at which the JSON must be split . If the entire JSON makes a single solr document , the path must be “/” .
  • f : This is a multivalued mapping parameter . At least one field mapping must be provided . The format of the parameter is {target-field-name}:{json-path} . The ‘json-path’ is a required part . target-field-name is the name of the field in the input Solr document.  It is optional and it is automatically derived from the input json
  • echo : This is for debugging. set it to true , if you want the docs to be returned as a response. Nothing will be indexed

example 1:

curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=first:/first'
'&f=last:/last'
'&f=grade:/grade'
'&f=subject:/exams/subject'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'

This indexes the following two docs

   {
      "first":"John",
      "last":"Doe",
      "marks":90,
      "test":"term1",
      "subject":"Maths",
      "grade":8
      }
    {
      "first":"John",
      "last":"Doe",
      "marks":86,
      "test":"term1",
      "subject":"Biology",
      "grade":8
      }

As the final field names are the same as the input document fields, the request can be simplified as,

example 2 :

curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=/first'
'&f=/last'
'&f=/grade'
'&f=/exams/subject'
'&f=/exams/test'
'&f=/exams/marks'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'

Wildcards

Instead of specifying all the field names explicitly , it is possible to specify a wildcard “*” or a wildwildcard “**” to map fields automatically. The constraint is that wild cards can be only used in the end of the json-path. The split path cannot use wildcards. The following are example wildcard path mappings

  • f=/docs/* : maps all the fields under docs and in the name as given in json
  • f=/docs/** : maps all the fields under docs and its children in the name as given in json
  • f=searchField:/docs/* : maps all fields under /docs to a single field called ‘searchField’
  • f=searchField:/docs/** : maps all fields under /docs and its children to searchField

With wildcards we can simplify our previous example as follows

example 3:

'curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=/**'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'

It is also possible to send all the values to a single field and do a full text search on that . This is a good option to blindly index and query JSON documents without worrying about fields and schema

example 4 :

'curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=txt:/**'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'

About Noble Paul

Read more from this author

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.