Indexing Custom JSON Data

Solr already supports update requests in JSON format. But it supports only Solr json format and not your own custom JSON. Now (with SOLR-6304 , version 4.10 onwards ), Solr supports any JSON document and the document can be indexed in the required format in Solr.

Transforming and Indexing custom JSON data

The objective of this feature is to help users index any JSON into a valid Solr document according to the users preference. It lets the user to split a single JSON file into 1 or more Solr documents. The final indexed document can be controlled using the mapping passed along the request . One or more valid JSON documents can be sent to the /update/json/docs path with the configuration params.

Mapping params

 

  • split : This parameter is required if you wish to transform the input JSON . This is the path at which the JSON must be split . If the entire JSON makes a single solr document , the path must be “/” .
  • f : This is a multivalued mapping parameter . At least one field mapping must be provided . The format of the parameter is {target-field-name}:{json-path} . The ‘json-path’ is a required part . target-field-name is the name of the field in the input Solr document.  It is optional and it is automatically derived from the input json
  • echo : This is for debugging. set it to true , if you want the docs to be returned as a response. Nothing will be indexed

example 1:

curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=first:/first'
'&f=last:/last'
'&f=grade:/grade'
'&f=subject:/exams/subject'
'&f=test:/exams/test'
'&f=marks:/exams/marks'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'

This indexes the following two docs

   {
      "first":"John",
      "last":"Doe",
      "marks":90,
      "test":"term1",
      "subject":"Maths",
      "grade":8
      }
    {
      "first":"John",
      "last":"Doe",
      "marks":86,
      "test":"term1",
      "subject":"Biology",
      "grade":8
      }

As the final field names are the same as the input document fields, the request can be simplified as,

example 2 :

curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=/first'
'&f=/last'
'&f=/grade'
'&f=/exams/subject'
'&f=/exams/test'
'&f=/exams/marks'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'

Wildcards

Instead of specifying all the field names explicitly , it is possible to specify a wildcard “*” or a wildwildcard “**” to map fields automatically. The constraint is that wild cards can be only used in the end of the json-path. The split path cannot use wildcards. The following are example wildcard path mappings

  • f=/docs/* : maps all the fields under docs and in the name as given in json
  • f=/docs/** : maps all the fields under docs and its children in the name as given in json
  • f=searchField:/docs/* : maps all fields under /docs to a single field called ‘searchField’
  • f=searchField:/docs/** : maps all fields under /docs and its children to searchField

With wildcards we can simplify our previous example as follows

example 3:

'curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/exams'
'&f=/**'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'

It is also possible to send all the values to a single field and do a full text search on that . This is a good option to blindly index and query JSON documents without worrying about fields and schema

example 4 :

'curl 'http://localhost:8983/solr/collection1/update/json/docs'
'?split=/'
'&f=txt:/**'
 -H 'Content-type:application/json' -d '
{
  "first": "John",
  "last": "Doe",
  "grade": 8,
  "exams": [
      {
        "subject": "Maths",
        "test"   : "term1",
        "marks":90},
        {
         "subject": "Biology",
         "test"   : "term1",
         "marks":86}
      ]
}'
Share the knowledge

You Might Also Like

The Definitive Guide to B2B Commerce Search and Product Discovery

B2B commerce search is having its “this is not just a search...

Read More

Mastering B2B Manufacturing Parts Search: Enhancing Supply Chain Efficiency

In modern manufacturing, finding the right part at the right time is...

Read More

Why We Built AI Ranking Insights: Making Search Rankings Finally Explainable

If you’ve ever owned search relevance, especially in a large B2B or...

Read More

Quick Links