A couple weeks back, we talked about the dumpster fire that is terrible enterprise search along with the path to something much better. Then, last week we did quick taxonomy dive into the four main types of enterprise search engine. This week we’re looking at the shopping list to get you going.

Starting the search for your next enterprise search solution? Or maybe you’re under the gun to replace the creaky, dusty, legacy system you already have in place? Either way, here’s a rundown of the top features and capabilities to consider as you research and shop for a powerful enterprise search experience.

Search Capabilities

Keywords and key phrases are one of the most familiar components of the search experience. A query parser receives a query and translates it into parameters for a search algorithm, which is executed across a specialized database. Older search technologies required a specific syntax where a user would explain in a specialized language exactly what they wanted. Modern search parsers such as Solr’s DisMax parser work without this specificity. Advanced queries such as geospatial search still require specialized syntax. Most of the time, this specialized syntax is computer-generated by a search application built to speak to the search engine.

Faceting is a critical capability that allows the user to filter results efficiently based on a specific field. This is especially useful for limiting a search to a category or department. Faceting can also be used for range filtering like date or price. Most search solutions allow you to define synonyms. This capability enables implementers to adapt user queries to their corpus of data. For example, implementers may define “lawyer” as a synonym for “barrister.” That will allow searches for “lawyer” to return documents that also might contain the term “barrister.”

Signal capture is the capture of user behavior data or “signals.” In most search applications, signals include events such as user queries, clicks, adds, purchases and other similar clickstream data. However, in advanced uses, this may include user location, vector, altitude or any number of event type data. Signal capture is a collaboration between the search application, the back-end search solution and the front-end search application. The application captures the actual click and sends it to the search solution, which processes the event and stores it for later use. Newer features, such as recommendations, depend on signal capture.

Artificial intelligence and recommendations are newer techniques for finding users’ relevant data. Not all data lends itself perfectly to keyword or key phrase search. For example, the most relevant result for a query may not be the one that contains the matching keyword. One way of solving this problem is by looking at what users clicked on most often for that query. However, that may not be enough. Context is important, and the user is important. Instead of showing results that most users clicked on, it may be more relevant to show what similar users clicked on. There are numerous algorithms and techniques for using user behavior data to help users find exactly what they need, even before they know they need it. This is the next frontier of enterprise search.

Query pipelines allow implementers to change queries in stages and to change the data that is returned. The stages approach is a critical tool to use in order to handle complex data or complex queries or to provide behavior profiling functionality. Some solutions offer prepackaged functionality as well as extensibility using JavaScript or other programming languages.

UI Capabilities

A powerful search back-end is important, but users only see the front-end user interface (UI). Previously, some search solutions left this entirely to the implementer, but now some offerings are providing UI functionality that allows implementers to import or compose their UI instead of having to create it from scratch. This makes a lot of sense given most search UIs do similar things. Moreover, given that AI and personalization functionality often collaborate with UI, modern UIs are too complex to write and maintain by hand without a larger staff of experts.

WYSIWYG embedding is the most advanced functionality in the marketplace. This allows implementers to configure a search UI in a web-based administration tool and then “include” it on their site using HTML or JavaScript statements.

Smart panels and widgets combine back-end functionality, like recommendations or similarity search (aka “More Like This”), with UI components. These allow implementers to include this functionality in their UI without having to write the underlying UI or back-end handling code. In cloud-based solutions, these come with preconfigured, back-end implementations.

Component libraries provide common UI functionality, such as typeahead. These exist in many forms from tag libraries or JavaScript APIs.

Although technically a back-end component, REST connectivity is critical to any modern search UI. Most mobile and web UIs connect via JSON over a REST interface.

Typeahead and other forms of auto-suggest are now standard user expectations in search UI. As users type, the UI suggests what they’re likely to be interested in.

Auto-classification is a more advanced form of typeahead. For example, when a user types “speaker,” and then “audio electronics” is automatically selected as a category.

Data Import

Data source connectors are an important piece of most search solutions. While REST APIs allow users to import from nearly any data source, having to write a connector to every common data source (i.e., Oracle, SQL Server or SharePoint) is a taxing endeavor for implementers. The most important question isn’t whether the solution supports the most connectors but whether the solution supports your data sources and any you are likely to deploy.

Parsers work in conjunction with data source connectors to process the data that comes back and turn it into documents. For example, if you’re scanning a local disk and pulling back files, should each file be loaded as a document or each row in the file? Is the document an XML or a CSV or a ZIP file? Parsers interpret the data into documents so that they can be further processed or indexed.

Pipelines are used to connect data sources, parsers and stages of logic used to manipulate data into well-formed documents. In older systems these were known as ETL processes – extract, transform, and load.

JavaScript is the modern scripting language that serves as a kind of “language of trade” for most developers. Because of that familiarity, some search solutions allow manipulating data with JavaScript.

A REST API allows operation control of the search solution as well as importing and exporting data.

Native libraries allow a search solution to bind into a system language like Java or Python without the implementer having to write REST API glue code.

Operational Capabilities

Scalability and capacity are important differentiators among search solutions. Can the system scale to the number of documents and users your system needs to support? How hard is it to add additional capacity and can that be done without significant downtime? Some solutions still use client-server architecture or rely on older computing technologies like shared file systems (NAS) instead of modern clustering topologies.

High availability is the capability of the system to suffer a hardware failure without data loss or downtime. This is a feature of a modern cluster topography. Modern search solutions should be resilient against network outages and multiple nodes of hardware faults.

Disaster recovery is the capability of the system to suffer the loss of a complete cluster or data center and failover to a backup site. This requires cross-data-center replication (also called WAN replication). This is important to deal with fiber cuts, weather, earthquakes or other unforeseen catastrophes without a major impact to business operations.

System monitoring is provided by modern search solutions in the form of REST APIs that provide statistics about system performance and uptime, including graphical displays and dashboards.

A/B testing shows admins whether changes to search pipelines or other functionalities improve search performance for users. When applying personalization, recommendations or other AI techniques, it is important to determine whether the changes actually improved click-through rates, purchases or any specified measure of success. A/B testing works by directing some traffic to the new pipeline and comparing it to the original configuration.


Connectivity to major security technologies like Active Directory, LDAP, Kerberos and SAML or other single sign-on systems is critical to a search solution’s security capabilities.

Role-based security authorization to determine which users are allowed to delete, read, modify or create documents as well as enact system changes.

Document-level security methods like security trimming to allow for fine-grained control to ensure that users don’t see documents they don’t have access to as a query result.

Analytics Capabilities

Usage analytics allow implementers to inspect how users interact with search and may even allow inspecting the actions of an individual customer. This capability allows implementers to understand how well they’re achieving their conversion goals and see changes in these metrics over time.

SQL connectivity allows analysts to chart and use data with common SQL tools. Solutions may make data as well as user behavioral data available via SQL.

Advanced Functionality

Streaming allows the solution to operate differently than normal. Usually, search solutions return the most relevant items first, but computing this is memory and resource intensive. Streaming allows results to be returned in the order they are retrieved and can return results based on conditions.

Named entity recognition (NER) is the use of natural language processing (NLP) to recognize the names of companies, individuals or other proper nouns. This can be useful for various types of filtered or faceted search.

Clustering and classification are machine learning techniques that allow data or queries to be grouped or labeled automatically.

Head/tail analysis is a machine learning technique that identifies and rewrites underperforming queries to be more like similar well-performing queries.

Delivering Outcomes at Scale

In addition to a rundown of all of these features essential to a platform, you need to make sure your vendor has a proven track record of experience helping organizations like yours deploy these types of solutions. A vendor that can truly partner with you to help you configure and design an enterprise search solution that fits your unique mix of data sources, users, and business problems – and specifically how AI, machine learning, and deep learning can fit in.. Look for vendors with both broad experience across industries but also can articulate and understand the idiosyncrasies of your particular business.

Let’s Get Going

It’s time to replace hit-or-miss search with an all-in-one answer platform for data diggers, fact finders, and edge seekers everywhere. More than anything, it’s time to find out what’s possible when employees have all the insights they need, whenever they need them. Contact us today or use the form below:

About Andy Wibbels

Read more from this author


Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.