Recently, someone on the #solr IRC channel asked me a question about using multiple filter queries in a request, while specifying a default for one of those queries in the solrconfig.xml. In this post, I’d like to talk about a simple pattern I’ve used in the past that solved the user’s specific goal, as well as a new Solr plugin I’ve written to try and solve a broader scope of similar use cases.

Some Background

While most Solr users are aware that Solr’s SearchHandler let’s you specify default request params in solrconfig.xml, many novice users don’t realize that SearcHandler actually supports 3 sections of “init” params: “defaults”, “appends”, and “invariants”.

A “defaults” init param will be ignored if the same param name is used in a Solr request, but “appends” params will be used in addition to any params with the same name in the request. “invariants” params take things to the extreme — request params with the same name are completely ignored.

For example, using the request handler configuration below, a request for /select?facet.field=author&fq=cat:books&rows=10000 would only return 10 documents per page (not 10000), would only facet on the author field (not category), and would only return documents that are in the books category AND in stock….

  <requestHandler name="/select" class="solr.SearchHandler">
     <lst name="defaults">
       <bool name="facet">true</bool>
       <str name="facet.field">category</bool>
       <str name="q">*:*</str>
    <lst name="appends">
      <str name="fq">inStock:true</str>
    <lst name="invariants">
      <int name="rows">10</int>

The Question

The use case brought up on IRC can be summarized as follows:

  • By default, a special query should be used to constrain the set of documents available for all queries
  • Clients need to specify multiple “fq” params to drill down into the results of their queries
  • Clients need to be able to override the constrained subset of documents available on a per-request basis.

Or to express more concretely based on the example Solr schema:

  • By default, only documents matching inStock:true should be returned from any query
  • Clients will send ‘fq’ params to filter on fields like ‘price’ and ‘cat’
  • Clients need to have the ability to override the inStock:true default behavior to search all docs (*:*) or documents that are not in stock (inStock:false)

The crux of the problem being: We need a way to specify in our configuration an “fq” that can be explicitly overridden at request time, but won’t automatically be overridden by an fq=price:[* TO 100] or fq=cat:electronics.

We can’t just specify <str name="fq">inStock:true</str> in our request handler “defaults”, or it will be ignored when another “fq” param is specified by the client. Likewise we can’t include it in our “appends” (or “invariants”) because then the client won’t ever be able to override it.

The Solution

The solution to this sort of problem is surprisingly simple, but not immediately obvious.

By taking advantage of variable de-referencing in Local Params, we can specify an “appends” fq filter that delegates to a custom parameter name of our choosing. We can then specify that custom param name in our “defaults”, and still allow clients to override is as needed.

In the example configuration below, I’ve defined a custom param named “base_set” that is used as a variable in an appended fq. Requests like /select?q=video and /select?q=ipod&fq=cat:music will be automatically constrained by the default “base_set” of inStock:true, but a request like /select?q=ipod&fq=cat:connector&base_set=*:* will override the default “base_set” with the custom value specified by the client.

  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="base_set">inStock:true</str>
    <lst name="appends">
      <str name="fq">{!v=$base_set}</str>

Idea For A Better Solution

While the solution above works, It requires you to trust your clients to specify an arbitrary query for the “base_set” param. It occurred to me while explaining this solution on IRC, that it would be really nice if there was an easy way to create custom paramaters like this that were constrained to a set of fixed possible values that could be referred to using custom names as well.

So with this in mind, I started imagining how a switch QParser might be useful.

The basic idea is that the switch parser should support any number of arbitrarily named params specifying “switch cases”. Each case can identify a different Query that the switch parser will returned depending on the (trimmed) value of the query string based to the parser. An optional “default value” param can be used to specify a Query in the event that the query string passed to the parser doesn’t match any of the configured cases. In the examples below, the result of each query would be XXX

q = {!switch s.yak=qqq}foo
q = {!switch s.yak=zzz} bar     // extra whitespace
q = {!switch defSwitch=XXX}asdf // fallback on default
q = {!switch s=XXX s.yak=qqq}             // blank input

(The use of the “s.” prefix would not only help ensure that we didn’t accidentally have conflicts between switch values and other special param names like defSwitch, but it would also allow us to have a switch case of s for completely empty query string.)

Assuming we had a switch parser like this, we could then declare some defaults & appends params on our request handler that would still allow our clients to select the “base_set” used in their queries, but only from a pre-configured list of options….

  <requestHandler name="/select" class="solr.SearchHandler">
    <lst name="defaults">
      <str name="base_set">in_stock</str>
    <lst name="appends">
      <str name="fq">{!switch s.all='*:*'

With a configuration like this, queries would be constrained by default to documents matching inStock:true but clients could specify one of 3 legal values for the base_set param to override the set of documents being search — with out needing to know the implementation details of how those base_set values are implemented. Any client that attempts to specify a non-legal value for the base_set should get an error (But we could always add the defSwitch local param if we want to have an automatic default for non-valid input).

Implementing This Idea

Implementing a SwitchQParserPlugin is actually fairly straight forward. My usual advice for understanding how to write a new QParserPlugin is usually to start with the source for TermQParserPlugin since it is the simplest existing implementation. But since we already know we want our QParser to delegate to a sub-parser, the BoostQParserPlugin serves as a better example.

The meat of our plugin is a to return a QParser whose parse() method does three key things:

  • Check the v local param for the query string and trim if it exists
  • Use the query string to lookup the query value from the switch params, falling back to the default param if set
  • Create a sub parser for the resulting query value, or throw an error if there isn’t one
public Query parse() throws SyntaxError {
  String val = localParams.get(QueryParsing.V);
  String subQ = localParams.get(SWITCH_DEFAULT);
  subQ = StringUtils.isBlank(val)
    ? localParams.get(SWITCH_CASE, subQ)
    : localParams.get(SWITCH_CASE + "." + val.trim(), subQ);
  if (null == subQ) {
    throw new SyntaxError( "Error: didn't match a switch case" );
  subParser = subQuery(subQ, null);
  return subParser.getQuery();

All of which is hopefully fairly straight forward.

Besides some basic java boiler plate, the only other code we really need to worry about is overriding a few default methods from the QParser base class to ensure that we delegate them to the corresponding methods in the subParser selected above based on the switch case used. That way our SwitchQParserPluginwill behave exactly the same as the parser it wraps.

The end result, with some tests and documentation, can be found in SOLR-4481. After some review and discussion from the community, I’m hopeful it will be included in Solr 4.2.

UPDATE: SwitchQParserPlugin is now part of SOlr 4.2, but please note the param names changed slightly in the final version from what was described in this post.