A couple of weeks back, we talked about the dumpster fire, which is a terrible enterprise search along with the path to something much better. Then, last week we did a quick taxonomy dive into the four main types of enterprise search engines. This week we’re looking at the shopping list to get you going.

Starting the search for your next enterprise search solution? Or maybe you’re under the gun to replace the creaky, dusty, legacy system you already have? Either way, here’s a rundown of the top features and capabilities to consider as you research and shop for a powerful enterprise search experience.

Search Capabilities

Keywords and keyphrases are one of the most familiar components of the enterprise search experience. A query parser receives and translates a query into parameters for an enterprise search algorithm executed across a specialized database. Older search technologies required a specific syntax where a user would explain in a specialized language precisely what they wanted. Modern search parsers like Solr’s DisMax parser work without this specificity. Advanced queries such as geospatial search still require specialized syntax. This specialized syntax is often computer-generated by an enterprise search application built to speak to the search engine.

Faceting is a critical capability that allows the user to filter results efficiently based on a specific field. This is especially useful for limiting an enterprise search to a category or department. Faceting can also be used for range filtering like date or price. Most enterprise search solutions allow you to define synonyms. This capability enables implementers to adapt user queries to their corpus of data. For example, implementers may define “lawyer” as a synonym for “barrister.” That will allow searches for “lawyer” to return documents that also might contain the term “barrister.”

Signal capture is the capture of user behavior data or “signals.” In most search applications, signals include events such as user queries, clicks, adds, purchases, and other similar clickstream data. However, advanced uses may include user location, vector, altitude, or any event-type data. Signal capture is a collaboration between the enterprise search application, the back-end search solution, and the front-end search application. The application captures the actual click and sends it to the enterprise search solution, which processes the event and stores it for later use. Newer features, such as recommendations, depend on signal capture.

Artificial intelligence search and recommendations are newer techniques for finding users’ relevant data. Not all data lends itself perfectly to keyword or keyphrase searches. For example, the most relevant result for a query may not be the one that contains the matching keyword. One way of solving this problem is by looking at what users clicked on most often for that query. However, that may not be enough. Context is essential, and the user is necessary. Instead of showing results that most users clicked on, it may be more relevant to show what similar users clicked on. There are numerous algorithms and techniques for using user behavior data to help users find exactly what they need, even before they know they need it. This is the next frontier of enterprise search.

Query pipelines allow implementers to change queries in stages and to change the data that is returned. The stages approach is a critical tool for handling complex data or queries or providing behavior profiling functionality. Some solutions offer prepackaged functionality and extensibility using JavaScript or other programming languages.

UI Capabilities

A powerful enterprise search back-end is essential, but users only see the front-end user interface (UI). Previously, some enterprise search solutions left this entirely to the implementer. Still, some offerings provide UI functionality that allows implementers to import or compose their UI instead of having to create it from scratch. This makes a lot of sense, given most search UIs do similar things. Moreover, given that AI and personalization functionality often collaborate with UI, modern UIs are too complex to write and maintain by hand without a more extensive staff of experts.

WYSIWYG embedding is the most advanced functionality in the marketplace. This allows implementers to configure a search UI in a web-based administration tool and then “include” it on their site using HTML or JavaScript statements.

Smart panels and widgets combine back-end functionality with UI components, like recommendations or similarity search (aka “More Like This”). These allow implementers to include this functionality in their UI without writing the underlying UI or back-end handling code. In cloud-based solutions, these come with preconfigured, back-end implementations.

Component libraries provide common UI functionality, such as typeahead. These exist in many forms, from tag libraries or JavaScript APIs.

Although technically a back-end component, REST connectivity is critical to any modern smart search UI. Most mobile and web UIs connect via JSON over a REST interface.

Typeahead and other forms of auto-suggest are now standard user expectations in search UI. As users type, the UI suggests what they will likely be interested in.

Auto-classification is a more advanced form of typeahead. For example, when a user types “speaker,” “audio electronics” is automatically selected as a category.

Data Import

Data source connectors are an essential piece of most enterprise search solutions. While REST APIs allow users to import from nearly any data source, writing a connector to every common data source (i.e., Oracle, SQL Server, or SharePoint) is a taxing endeavor for implementers. The most important question isn’t whether the solution supports the most connectors but whether the solution supports your data sources and any you are likely to deploy.

Parsers work with data source connectors to process the data that comes back and turn it into documents. For example, if you’re scanning a local disk and pulling back files, should each file be loaded as a document or each row in the file? Is the document an XML a CSV, or a ZIP file? Parsers interpret the data into documents so that they can be further processed or indexed.

Pipelines are used to connect data sources, parsers, and stages of logic used to manipulate data into well-formed documents. In older systems, these were known as ETL processes – extract, transform, and load.

JavaScript is the modern scripting language that is a “language of trade” for most developers. Because of that familiarity, some enterprise search solutions allow manipulating data with JavaScript.

A REST API allows operation control of the enterprise search solution and importing and exporting data.

Native libraries allow a search solution to bind into a system language like Java or Python without the implementer having to write REST API glue code.

Operational Capabilities

Scalability and capacity are essential differentiators among enterprise search solutions. Can the system scale to the number of documents and users your system needs to support? How hard is it to add additional capacity, and can that be done without significant downtime? Some solutions still use client-server architecture or older computing technologies like shared file systems (NAS) instead of modern clustering topologies.

High availability is the capability of the system to suffer a hardware failure without data loss or downtime. This is a feature of modern cluster topography. Modern enterprise search solutions should be resilient against network outages and multiple nodes of hardware faults.

Disaster recovery is the capability of the system to suffer the loss of a complete cluster or data center and failover to a backup site. This requires cross-data-center replication (also called WAN replication). This is important to deal with fiber cuts, weather, earthquakes, or other unforeseen catastrophes without significantly impacting business operations.

System monitoring is provided by modern enterprise search solutions in the form of REST APIs that provide statistics about system performance and uptime, including graphical displays and dashboards.

A/B testing shows admins whether changes to enterprise search pipelines or other functionalities improve user search performance. When applying personalization, recommendations, or other AI search techniques, it is crucial to determine whether the changes improved click-through rates, purchases, or any specified measure of success. A/B testing works by directing some traffic to the new pipeline and comparing it to the original configuration.

Security

Connectivity to significant security technologies like Active Directory, LDAP, Kerberos and SAML, or other single sign-on systems is critical to an enterprise search solution’s security capabilities.

Role-based security authorization to determine which users are allowed to delete, read, modify, or create documents as well as enact system changes.

Document-level security methods like security trimming allow for fine-grained control to ensure that users don’t see documents they don’t have access to as a query result.

Analytics Capabilities

Usage search analytics allow implementers to inspect how users interact with enterprise search and may even allow inspecting the actions of an individual customer. This capability allows implementers to understand how well they achieve their conversion goals and see changes in these metrics over time.

SQL connectivity allows analysts to chart and use data with standard SQL tools. Solutions may make data as well as user behavioral data available via SQL.

Advanced Functionality

Streaming allows the solution to operate differently than usual. Usually, enterprise search solutions return the most relevant items first, but computing this is memory and resource-intensive. Streaming allows results to be returned in the order they are retrieved and can return results based on conditions.

Named entity recognition (NER) uses natural language processing (NLP) to recognize the names of companies, individuals, or other proper nouns. This can be useful for various types of filtered or faceted enterprise search.

Clustering and classification are machine-learning techniques that allow data or queries to be grouped or labeled automatically.

Head/tail analysis is a machine learning technique that identifies and rewrites underperforming queries to resemble similar well-performing ones.

Delivering Outcomes at Scale

In addition to a rundown of all of these features essential to a platform, you must ensure your vendor has a proven track record of experience helping organizations like yours deploy these solutions. A vendor that can indeed partner with you to help you configure and design an enterprise search solution that fits your unique mix of data sources, users, and business problems – and precisely how AI, machine learning, and deep learning can fit in.. Look for vendors with both broad experience across industries but also can articulate and understand the idiosyncrasies of your particular business.

Let’s Get Going

It’s time to replace hit-or-miss enterprise search with an all-in-one answer platform for data diggers, fact finders, and edge seekers everywhere. More than anything, it’s time to find out what’s possible when employees have all the insights they need, whenever they need them. Contact us today or use the form below: Contact us today or use the form below.

About Katie Florez

Read more from this author

LEARN MORE

Contact us today to learn how Lucidworks can help your team create powerful search and discovery applications for your customers and employees.