Lucidworks Neural Hybrid Search reduces null results by 91% — Learn more here

New User’s Guide to Fusion Effectiveness

If you’re new to Fusion, you may be looking for guidance about where to get started with its many powerful features and capabilities. You may even be new to search in general and unfamiliar with search terms. This session will provide clarity and help you focus to deliver quick wins.

Intended Audience

Anyone interested in getting a fast track on getting started with Fusion. No prerequisite knowledge needed.

Attendee Takeaway

Learn how to effectively use Fusion, no matter where you are in your search journey.

Speakers

David Im, Solutions Engineer, Lucidworks
Eric Smith, Solutions Engineer, Lucidworks

[Narrator]

Welcome to The New User’s Guide to Fusion Effectiveness. Fusion is an excellent search platform with lots of wonderful capabilities. And understanding all the ins and outs can be very daunting for anyone just beginning their journey in the world of Fusion. Luckily, David and Eric here have put together the 2021 Nobel Peace Prize-winning presentation to help you get through the world of Fusion. Without further ado, here are David and Eric.

[Eric Smith]

Thank you for the wonderful introduction, omniscient narrator. As the narrator mentioned, I’m Eric.

[David Im]

And I’m David.

[Eric Smith]

We both started working here at Lucidworks back in January and have been compiling a useful list of things to know in order to work with Fusion effectively. We like to refer to this as Fusion Advanced Readiness Training, otherwise known as FART.

[David Im]

Here’s what you’ll learn from FART. First, what is Fusion in general as a search solution, how to navigate the Fusion UI, then an overview of the Fusion data workflow from index to query, additional key components that exist in Fusion, and tips and tricks we learned to just jumpstart our Fusion journey. Let’s begin.

What is Fusion? Fusion as a platform for building search and discovery applications built by Lucidworks. It is cloud hosted and built on the powerful Apache Solr, which stores the index and drives all the queries. Basically the brains of the operation. The basic process is data feeds into index pipelines, which feed into Solr collections, Solr collection store the data, and on the other side, query pipelines process search queries then sends it to Solr to retrieve the response. Let’s get started.

[Eric Smith]

When you first opened up Fusion, you’ll be met with this quick start screen. But since this is the advanced readiness training, we will not be going through this. And we’re going to exit out and show you what the real thing is.

[David Im]

General Fusion UI overview. This is Fusion in all its glory. First let’s look at the top left, how to change the app. At the top left is a hexagon icon. Move to it, and it shows all the currently available apps, which you can switch to. As you move down the column, you can see various functions. Let’s go to the one called collections. These are all the collections in this particular app. At the top left, you can see the current collection we are on, which is activate training demo.

Navigating the UI. As you open the components, they open windows. You can scroll it to move between open components. Stretching windows. Additionally, all windows can be stretched with clicking and dragging the edge of the component. Finally, at the bottom right is a button that says close all, to close all currently opened windows. Fusion process overview. Indexing, data sources are how we get data into Fusion. There is one for each source. In the UI navigate from the left sidebar. Then in data sources, click add, and you can see here that in the data source, there’s a data source ID and index pipeline, which is how we tweak the incoming data and a parser, which is how the data gets interpreted. When you’re ready, click save. Then click run to run a data source, to send docs from an index pipeline to a collection. You can also schedule these data sources to run at certain intervals.

Next, if you go to the index pipelines, you can see that the index pipelines is made up of stages. One for each function. You can disable and enable these pipeline stages by clicking the circle.

Collections. From the top left corner, go to the collections manager. And from the top left corner, you can also see which collection you’re currently on. From the collection manager, you can click new to create a new collection, and there’s an advanced options for more configurations.

[Eric Smith]

Now we’ll move on to the querying side of things. Similarly to indexing, if we go over on the left and open up the querying tab, we can select query pipelines. These queries pipelines are very similar to index pipelines in the fact that you can disable and enable each of the stages and interact with all the pieces within those stages individually. When a pipeline is ready to start being used, we can create a query profile.

Query profiles are created by selecting a pipeline and a collection that you want to be returning or querying your results from. At the bottom of the screen, you’ll be able to see an end point that’s returned from the profile when it’s created. This end point can be used to send your queries to and return those results.

A useful tool when you’re creating query pipelines is the query work bench. Query work bench will give us a nice view of, not only the pipeline that we’re actively looking at, but the results from the query that we’re sending it. With the work bench open, you can add and edit existing stages. For example, we’ll select this text tagger stage and we can make any edits we want to that. Hit that apply button that you see, and it’ll immediately update our results with whatever the changes had to do.

Additionally, in the top right corner of our query work bench, we have the new load and save button. The new button will allow us to create a new query pipeline. The load button will allow us to load other query pipelines that we might want to start working with. And when we hit the save button, we’ll be met with this dialogue. This screen allows us to either overwrite an existing pipeline with the current pipeline that we have open in our work bench, or create a new pipeline. And this create new pipeline can actually be used to duplicate pipelines that we have, if you have a need for very similar pipelines, with lots of the same stages and configurations.

Another useful feature of the query work bench is the ability to add facet fields. Clicking on this add facet field button will allow us to select a field that we want to facet on. So if we type in the date field here and select that, that facet column will be populated with values and counts of documents with those corresponding values for that field. Selecting one of those values will limit our results to only documents with that value in that field.

Additionally, we have this display fields tab on the top. Opening this up, we can see, we can change the name and description of the results that we’re seeing on that query work bench screen. If we change the name to category name and the description to address, our results screen will look something like this. As you can see, the results screen looks a little different now because it has replaced that name and description with the two fields we selected.

On top of all of this in the top right corner, under the load and save buttons, we have the parameters and the URI button. Selecting parameters will open up this row that shows us all the query parameters that are being sent with our queries. This is also a useful place to manipulate and add new parameters and see their effects that they would have on the results.

The URI button will show us the working and published URIs that are being used to actually send these queries and retrieve the results, which gives us a better feel for what’s actually going on behind the scenes.

Lastly, in the bottom right, there’s this view as drop down. Selecting this will allow us to change what the results actually look like and get a better idea of what’s going on. Selecting debug will give us this breakdown of not only the different portions of the queries and how long they take, but as well as calculations explaining how each document has been scored.

The JSON view does exactly what you would expect it to and shows us the JSON representation of the response that we’re receiving. This can be nice to get an idea for what the response will look like when you’re interacting with a query pipeline from outside of the Fusion UI.

[David Im]

Last little tidbits, the jobs can be accessed from the side left sidebar. You can see the various jobs, each one unique. They can be used to regularly make rest calls, aggregate data, perform analysis, cleanup collections, and much more.

Jobs can be scheduled, and most jobs have an advanced button for more configuration options. Key ones we wanted to point out, is the custom spark job for custom Scala scripts. The custom Python job for custom Python scripts.

Next is blobs. From the system sidebar, you can access the blobs. This provides an accessible file system within Fusion and hosts a matter of key files used throughout Fusion.

Lastly is signals, accessed from the collection manager. Simplified, signals are user activity data. They are a sidecar collection. Part of a main data collection. The most common type is user queries. As you can see, you enter a query like hello world. And then the signals collection is stored as a signal, as a document. Pro tips for pros.

[Eric Smith]

As you’re working through Fusion, you’ll notice these popups in the bottom right corner, showing us some notifications of information of what’s going on. Either errors or success messages. To view a better view of these notifications, or if you’ve missed one before it fades away, the bell in the top right corner of your screen can be clicked to view all of the notifications that you may have missed. You can also click the clear all button to clear all of these notifications.

On the left-hand side here, you’ll see we’ve clicked the all services operational button. This won’t always say that, and sometimes some services might be down or struggling. Clicking on this will give us a nice view of what all these services are inside of Fusion. And if any of them are having problems.

Additionally, on the left-hand side, if we go into the systems drop down and select log viewer, we’ll be able to see all the logs within Fusion and get a lot more detail as to what these issues we’re encountering might actually be.

As you’re aware, Fusion is Kubernetes based and cloud hosted. And some basic level knowledge of Kubernetes can really help you to understand some of the issues you might come across while working with Fusion. Two commands that we like are kubectl get pods, which gives you a nice overview of the statuses of all those pods. So you might be able to understand or see that one of your services is down or your pods are down and know that that might be a cause of one of your problems.

Additionally, kubectl logs -f, inserting a pod name, will give you the logs from that pod, which you can use to decipher some issues that might be happening if a pod is down or struggling.

[David Im]

Next is the objects API. Here, you can export entire apps, individual components, or types of components. The import Fusion objects function from the sidebar can take a zip of an object . JSON to import objects. Also here’s a curl command to actually access that object style API.

Next Jupyter can be turned on for a Fusion instance, allowing a direct way to test your Fusion collection, or a development ground for new jobs. Jupyter can be enabled through the Fusion Jupyter parameter enabled true.

[Eric Smith]

Now I’m going to touch on the JavaScript stages, which can be used both within index pipelines and query pipelines. These stages are nice for if you have a need for a stage that doesn’t actually exist within Fusion currently. We can add our own scripts and do almost whatever we want with it.

On the index side of things, we’ll be met with a function that gives us access to the dock object and the CTX object. The dock object will reference the current document flowing through the pipeline. And the CTX object is the context of the pipeline itself.

On the query side of things, we will have a request object and a response object, as well as that context object. And the biggest thing to note here is on the right, a nice little formatting trick for interacting with these JavaScript stages, is nesting this function that returns the main function, will allow us to write code that feels a little bit more comfortable.

So we can have additional functions defined such as foobar in this case, which you can then see is being used within the main function before returning the response.

[David Im]

Solr Schema can be accessed through the Solr config in the system sidebar. Here you can find the managed schema and the Solr config.. Be careful about overriding the Solr config because it is very important, but it can be a good reference point.

The managed schema on the other hand, is a good way to see how Solr is interpreting your data and change it if you need to change it for whatever particular reason.

[Eric Smith]

Hopefully due to Fusion Advanced Readiness Training, you have an idea of what the process of data flowing through Fusion is like and what it takes to deliver the best search solutions, along with some tips and tricks that can make you seem like a real Fusion pro. Thank you for listening.