Nested Queries in Solr

The ability to nest an arbitrary query type inside another query type is a useful feature that was quietly added to Solr some time ago, along with the support for query parser plugins to support different query types.I finally got around to fixing nested queries for the function query parser, and figured it was high time I documented nested queries, along with the LocalParams syntax that allows one to add metadata to a query parameter, or even change the type of a query (i.e. which query parser is used to parse the query string.) You might also be interested in:

Nested Queries in Lucene Syntax

To embed a query of another type in a Lucene/Solr query string, simply use the magic field name _query_.  The following example embeds a lucene query type:poems into another lucene query:

Now of course this isn’t too useful on it’s own, but it becomes very powerful in conjunction with the query parser framework and local params which allows us to change the types of queries.  The following example embeds a DisMax query in a normal lucene query:

And we can further use parameter defererencing in the local params syntax to make it easier for the front-end to compose the request:

Nested Queries in Function Query Syntax

This is the part that was previously broken, and is only fixed/available in Solr 1.4.  You can use query() function to embed any other type of query in a function query, and do computations on the relevancy scores returned by that query.  Some examples from the Solr wiki are here.

Pure Nested Query

There is also a nested query parser plugin that allows one to create pure nested queries.  Is a nested query without any containing query even useful? Surprisingly yes, as it allows further decomposition of query requests.

For example, the following allows an easy way for the client to specify that they want some sort of recency date boost added into the relevancy score, while leaving the exact query type up to the Solr server config (via search handler defaults in solrconfig.xml)

The client query would specify the boost query as $datefunc:

And the defaults for the handler in solrconfig.xml would contain the actual definition of datefunc as a function query:

The same idea could be used to allow a client to switch between complex filters, without having to specify what those filters are.

Without the nested query parser type, it would only be possible to specify the query value in a separate place (via local params v=$param) not the type also.

The Future

An XML Query Parser is on the way via SOLR-839 that will allow expressing arbitrarily complex Lucene queries in XML.  As the number of query parsers grows, the importance of being able to mix, match, and nest them will become increasingly important.   One of the first extensions to the XML query parser should be to hook in nested queries of course!

The subclasses of QParserPlugin show all of the query parsers currently available to Solr.  If you can’t find the query parser you’re looking for, you can create your own and register it via solrconfig.xml!

 

 

Share on FacebookTweet about this on TwitterShare on Google+Pin on PinterestShare on RedditShare on LinkedIn

Your email address will not be published. Required fields are marked *

*

30 Comments

Andrew Kutz

What I would really like to see is the ability to take the results of the first query and use one or more of its fields as an argument in the second query. For example:

type:foo AND (_query_:type:bar AND id:{field1})

This should search for all types of foo and then iterate over the result set and perform a query for where type is bar and id is equal to the value of field1 from each item of the first result set.

I can do this in a linear fashion on the client side, but I am hoping to be able to make it more efficient by it being implemented on the server side.

Is there any hope for this?

Reply
Edoardo Marcora

I too need the functionality described by Andrew in ‘One Response to “Nested Queries in Solr”’. I was actually fooled by the title of this article in thinking that Solr provides functionality similar to nested queries/subqueries in SQL.

Reply
Ravi Gidwani

Hi Andrew,Edoardo:
Isnt your nested query requirement translate into

type:foo AND (type:bar AND id:{field1})

as a single query?

BTW wont above query always be a empty set, since you first filter by type:foo (hence only documents with type=foo are returned and then you expect to filter for type:bar)?

Reply
cena

TSQL: where lat !=0 and lng !=0
what’s the solr query ?
i think is : lat: (-0) AND lng: (-0) ,but is error !

Reply
Naomi Dushay

Note: quotes are important!!!!

text:hi AND _query_:”{!dismax qf=title pf=title}how now brown cow”

works

text:hi AND _query_:{!dismax qf=title pf=title}how now brown cow

does NOT work

Reply
Naomi Dushay

Yonik – what do you think about putting this info into a Solr wiki page? If you’d like, I can take a first stab at it, referencing this post.

Reply
Audrey Foo

Ravi

I believe Andrew and Edoardo want addition; show all results of type foo, as well as results of type bar where id:field1

Although, couldn’t that be done with an OR instead?
type:foo OR (type:bar AND id:{field1})

For me, I would like to make a query such as:
show me all with keyword abc, except if type is contact and uid is 123

Reply
Piet Seiden

Nested queries look like something useful to us as we would like to combine boolean queries with the ability to boost score on individual fields. However, it seems like a nested dismax query will only accept one qf parameter, invalidating this approach. Could this be a bug or have I messed up the syntax in this query?
http://…/select?indent=on&version=2.2&q=bridge%20AND%20_query_:%22{!dismax%20qf=cql.anyIndexes%20qf=dc.title^4%20v=$qq}%22&qq=broer&fq=&start=0&rows=3&fl=dc.title%2Cdc.creator%2Cscore&wt=standard&debugQuery=on&explainOther=&hl.fl=

Reply
rob casson

Piet,

looks like you can need to enclose the multiple qf parameters in single-quotes…something like this (pulled from the wiki):

q={!type=dismax qf=’myfield yourfield’}solr rocks

hope that helps.

Reply
adam everett

is there any way the params inside the subquery can default from the dismax handler I have setup?

I try,
/select/?q=potter and _query_:”{!type=dismax}rowling”

but the dismax query does not pick up the defaults for the dismax handler I have configured. I end up with no results.

so I have to end up coding,

/?q=potter and _query_:”{!dismax qf=’Title^3 Author^1′ bf=’ord(BoostFactor)^5′}rowling

What am I missing?

Reply
yonik

> but the dismax query does not pick up the defaults for the dismax handler I have configured.

You need to configure defaults on the request handler you are using, not on the dismax handler (which is just a normal request handler that defaults to using the dismax query parser for back compatibility).

Reply
ohad serfaty

Hey Yonik
This is truly a great feature. I’ve noticed thought that the results of the intermediate sub filter queries are not being cached , so for example , if I run these two queries :

qf=text:hi OR_query_:”{!dismax qf=title pf=title}how now brown cow”

qf=text:hello OR _query_:”{!dismax qf=title pf=title}how now brown cow”

The result of the second dismax query is not inserted into the filter cache and needs to be recalculated again.

Is there any way to control that and have solr cache the nested queries result ?

Reply
ys

I too need this – or in other words the intersection of fieldids that match
both type foo and type bar. So if my indices had these docs:

doc1 – fielda:foo
doc2 – fielda:yyy
doc3 – fieldb:bar
doc4 – fieldb:xxx
doc5 – fielda:bar

I want to return doc1 and doc5 because field a has both values that I am looking for – foo and bar.

I do not want to multivalue my fields.

What I would really like to see is the ability to take the results of the first query and use one or more of its fields as an argument in the second query. For example:

type:foo AND (_query_:type:bar AND id:{field1})

This should search for all types of foo and then iterate over the result set and perform a query for where type is bar and id is equal to the value of field1 from each item of the first result set.

I can do this in a linear fashion on the client side, but I am hoping to be able to make it more efficient by it being implemented on the server side.

Is there any hope for this?

Reply
Aaron

I am attempting to use nested queries with frange. If I set the fq=zip:38400 OR _query_:”zip:38200″ it works perfect.

However, as soon as I add a frange, the query errors out. currently I am trying:

fq={!frange l=0 u=1}dist(2, 44.1457, -73.8152, latitude, longitude) OR _query_:”{!frange l=0 u=1 units=mi}dist(2, 32.6126, -86.3950, latitude, longitude)”

Each of the frange queries work perfectly by themselves, however, I am attempting to find a method to filter based on multiple locations. Short of writing my own plugin (which I have no idea how to do) I’m coming to the end of my rope on this, and hoping that someone could shed a little light how I might get the frange queries working with nested queries.

Reply
danw

Hello,

I am taking advantage of the nested query syntax and find it very useful for the type of query I need. I am having issues with this syntax and the localsolr plugin / request handler. They don’t appear to play well together.

This works and brings back 14 results

http://website/select/?q=FullText:sales&qt=geo&lat=33.1580933&long=-117.3505939&radius=75

This query looks like it should return the same results but none get returned.

http://website/select/?q=_query_:”{!dismax+mm=’100%25’+qf=’FullText’+v=$ft}”&ft=sales&qt=geo&lat=33.1580933&long=-117.3505939&radius=75

I am using solr 1.4

Has anyone used the _query_ syntax and localsolr at the same time?

The only post I found that looks like a similar issue on the internet was this and it went unanswered.

http://www.mail-archive.com/solr-user@lucene.apache.org/msg30894.html

thanks in advance

dan

Reply
Scott

I’m trying to do a SQL-like JOIN in SOLR. One way to get more or less the same results as a join in SQL is to use a correlated sub-query where the sub-query matches a field value from the main query results dynamically instead of using a static value.

What I’m really looking for is comparing two sets of results with something like WHERE result1.id = result2.id AND result1.digest != result2.digest (the first part of this where is just a JOIN moved into the WHERE clause).

I believe this is what Andrew and others are asking–not to do a static string match, but to match result values from subqueries against each other.

My specific scenario is that we have historical archives of documents. Say there are a bunch of items that are saved and indexed on Jan 1, 2010 and another snapshot on Jan 1, 2011. The documents are stored in SOLR with unique ids made of a concatenated uri and digest. Any documents with the same uri and digest (in other words an unchanged item) share the same record and have multiple values for the snapshot date.

Given two snapshot dates, we want to find the following four things. 1) unchanged documents, 2) changed documents, 3) documents added to the second snapshot that don’t exist in the first and, 4) documents removed from the second snapshot that used to exist in the first. It would be nice to stay in solr to get these items instead of having to dump and process outside of solr in code or a database.

I can easily get identical results between the two since they share records and have both snapshot ids listed in one record. (job is snapshot id for what happened on a certain date).

This query gives everything that stayed the same for both snapshots since items with identical contents and uris share records that have multiple values for the jobs:

job:(00019036 AND 00019448)

By using faceting and counts, I can get lists of items that exist in both snapshots but have different contents. This seems to do that and the results end up in the facet area (kind of ugly, but works):

job:(00019036 OR 00019448) facet=true facet.field=uri facet.limit=-1 facet.mincount=2

But getting lists of things that existed in first snapshot but not the second (or opposite) seems pretty hard to get. Database JOINs can do this, but I’m not sure if this is possible to get from a subquery with SOLR. Any insights?

Reply
Solr by dasnom - Pearltrees

[…] Lucid Imagination » Nested Queries in Solr To embed a query of another type in a Lucene/Solr query string, simply use the magic field name _query_ . The following example embeds a lucene query type:poems into another lucene query: text:"roses are red" AND _query_:"type:poems" Nested Queries in Lucene Syntax Now of course this isn’t too useful on it’s own, but it becomes very powerful in conjunction with the query parser framework and local params which allows us to change the types of queries. […]

Reply
Hatalhammo

One point that was completely left out is that Solr is not a relational database, therefore the idea of a JOIN wouldn’t even exist.
To achieve similar results requires denormalization of the data as well as storing the data.

One way to do this would be to declare a multivalued stored and indexed field. Then you can use its values in a search query just like a SQL JOIN.

Reply
Blanca

Hi!
I am having the same problem as Aaron.

description:(bob dylan) AND _query_:”{!lucene df=description}description:(Black Crow Blues)”

Gives me 6 results, but

“{!optoken df=assetDescription_anno}assetDescription_anno:(Black Crow Blues)” AND _query_:”{!lucene df=assetDescription_anno}assetDescription_anno:(Black Crow Blues)”

Without ” gives some error: undefined field _query_

How to combine more (or more) query type?
Thanks

Reply
Christian

Hi!

I have tried to find out how to query an array structure like {“Email..”}

Im working with a system that post certain data like above to a field in the solr.

So, any help is much appreciated!

BR/Christian

Reply
ENKI-2

Cena, what you want is probably the query -lat:0 -lng:0

I’m confused as to why people expect a search engine to act like an RDBMS. Meh.

Reply
panpic

Good tutorial.
We can add more condition search after query
Such as: _query_:”{!dismax qf=title pf=title}how now brown cow” AND text:hi

Reply