Professional services firm PricewaterhouseCooper (PwC) is one of the Big Four accounting firms with a global workforce of 276,000 employees all over the world. That’s a lot of users that need a lot of data and information to do their jobs every day. PwC Director of Enterprise Search, Viren Patel, presented at this year’s virtual Activate, the Search and AI conference, to share their insights and methodologies for solving that “last mile” of search relevancy for their own internal search applications.
Building Enterprise Search at PricewaterhouseCoopers
PwC’s main enterprise search application is built with Lucidworks Fusion and includes a “Search as a Service” API-based architecture for teams to build applications that access the search index and integrate results into their own workflows. The search application indexed content from various source systems like content management systems, proposal hubs, libraries of learning content, ServiceNow tickets, digital assets, and internal video channels. Some business units also have their own private systems, so the search service lets users query just their private indexes securely, not the whole corpus. Lucidworks put all this information into one unified view that connects data, people, experts, projects, and companies.
As the search team looked at ways to start to sharpen relevancy for their apps, they found that search queries fall into a few common categories:
- Documents created to answer a specific question: What is the 2021 holiday schedule?
- Documents with relevant content like a user looking for a specific proposal presented to a customer for a specific project
- Browsing for insights: looking for information on internal data and information on specific people or companies
- Specific questions that aren’t tied to just one document: Who’s the global partner for Coca-Cola?
Even with all this technology and efficiency in connecting users and information, there were still advances to be made in the area of search relevancy to improve the user experience and increase productivity.
The Relevancy Problem
Like at many companies, the team at PwC didn’t have the resources to develop in-house, custom NLP solutions to improve accuracy and precision. They decided to extend their existing Lucidworks Fusion deployment and combined it with some off-the-shelf components from another vendor to create a solution that scored better results for every query.
The first part of the solution used Fusion’s signal capture and machine learning to collect information into what users searched for and what they clicked on and then using ML models for better ranking and scoring of search results.
But signals can’t be collected unless there’s clicks to capture. That’s why the solution also includes the Top Search Optimizer module from vendor Noonean Cybernetics. This monitors top search terms and maps user terms to corporate terms to build corporate ontologies.
Noonean’s Enterprise NLP module was also put to work to index and analyze documents to the sentence level. Enterprise NLP can understand grammatical structure and frequency per sentence to improve precision and accuracy of searches. This capability is usually applied to natural language searches or searches with three or more search terms.
To add a user feedback loop, a “thumbs-up” UI element was added to search results so users could quickly grade search results and relevancy. When a user gave a set of results a thumbs-up that was used in further relevancy calculations. Lucidworks Fusion recommendations was also being used to provide better recommendations based on a user’s past searches or searches of similar users.
A final ingredient to the relevance recipe was date biasing so newer and fresher content would rank higher in results.
Building Ontologies for Better Relevancy
Users don’t always know the “right” or “correct” words to use to find what they want. A PwC employee looking for the company’s cellphone policy and corporate discount offers might not know that internally that program is called eMobility. By understanding and monitoring how users are searching and the words they use, the “wrong” words can be mapped to the right corporate or internal term. In this case, a link was curated that boosted the main eMobility page to the top of results to get the user to where they want to be.
A second example is if an employee is searching for information about fraud, fraud governance, fraud detection, general ledger fraud, and other queries related to the firm’s Financial Crimes Unit. Through careful curation and mapping, these terms and related queries were added to the ontology to boost links related to the Financial Crimes Unit.
This might sound like simple synonym detection and replacement, but the difference is the query is not being rewritten or replaced. The ontology is relating the concepts of the query and expanding it to try and detect intent. So if the user searches for fraud governance but wasn’t trying to find the Financial Crimes Unit, the original intent of the query is still retained and the results will be based on that query.
Indexing Down to the Sentence Level
Another important capability was the grammatical search that allowed indexing down to the sentence level supplied by the Enterprise NLP module. So, unlike typical search functionality that’ll match queries to content in a document, these queries index to the sentence level and show a match only if the words in the sentence are in the correct grammatical relationship. So, a query like, “How can I request a new SIM card?” is probably not a curated query, and probably not a common query, but through indexing at the sentence level the accurate results can be returned.
This approach isn’t curation, it isn’t optimizing top searches, this approach is for the rare more difficult queries users are asking. Proving a Q&A-like result but against an entire corpus of documents. The goal of our results is to provide the most relevant document and the section of the text to find information that the user’s looking for.
Prioritizing Execution for Optimizing Queries
With all the above strategies, queries were prioritized into the following flow:
A query is submitted by the user to the search application.
If the query is a common or top search, it’s routed through the Top Search Optimizer and curated corporate or top search term and results are served back with the curated result at the top. 40%-50% of queries follow this path.
If the query is not a common query or consists of three or four more terms, it is routed through the Enterprise NLP module to provide results indexed at the sentence level, to the exact section of the document.
Any other results are fed through Fusion and conventional relevancy applies.
All of the query and click combinations from the above three approaches are fed into Fusion’s ML modeling so the system is continuously self-tuning, getting smarter with each search.
Increased Relevancy by 43%
The search team at PwC regularly collects, analyzes, and reports on key search metrics like query volume, search utilization, zero results queries, abandoned queries, and clickthrough rates. With these new relevancy improvements the team saw several improvements:
- Bridged the gap between the existing set of business rules and curation which was not consistently providing the best search results
- Increased relevancy by 43% with this “last mile relevancy” solution
- Reduction in abandoned and no click queries month over month
- The cognitive search and NLP approach provided better precision and accuracy of results and an improved user experience providing insights
And ultimately, all of this contributed to PwC being the winner of the Enterprise Workplace Solution at Activate 2019 that was judged by Forrester Research and other research companies.