Whether in finance, retail, healthcare, or oil and gas, data science and machine learning are pervasive across all domains and business processes. However, there is no “global” ML solution that works for all problems.
Data science teams are continuously adapting new frameworks and methods to solve challenges in the best possible way. This creates pressure on engineering and DevOps teams to be able to serve the latest solutions, with friction at hand-off points and potentially higher technical debt. The biggest challenge faced today by these technical experts is taking ML models to production quickly in the context of a fully functional and performant search application.
This post will cover in detail how Lucidworks Fusion reduces the friction of deploying custom machine learning models but if you’d like to see these tools in action be sure to also sign up for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1.
Machine Learning in Retail and Enterprise Search
Retail search and enterprise document discovery applications use data science as an important ingredient for personalizing mission critical applications. Simple keyword matching is no longer enough to satisfy today’s users. Semantic Search applied toward product recommendations, user query understanding, document categorization, sentiment analysis, and summarization are critical to providing enhanced, personalized experiences to consumers as well as employees. As data science teams strive to build models that satisfy requirements of new-gen users, having the ability to smoothly take those models into production is becoming critical.
Data Science Toolkit Integration in Lucidworks Fusion
Lucidworks Fusion is a cloud-native, scalable enterprise document discovery platform built with openness and pluggability at its core. Fusion seamlessly integrates with a variety of commercial and open source machine learning frameworks to derive insights from large unstructured documents. Use cases vary from e-commerce search applications, to conversational frameworks, to support portals and internal enterprise knowledge discovery applications.
Fusion’s Data Science Toolkit Integration is a model service that provides seamless integration with query and index pipelines to add intelligence for processessing incoming queries and documents. Fusion integrates with Seldon Core, an open source framework for model deployment management. Fusion’s Data Science Toolkit Integration enables data science teams to develop and validate models built for specific data and use Fusion to deploy them in production. This capability helps teams to:
- Streamline production of search-focused ML models
- Reduce data science teams dependencies on DevOps teams and vice versa
- Increase productivity, drive experimentation to fail fast, iterate, and improve
Deployment and Consumption Workflow
Data Science teams will
- validate models for organizations problems,
- convert them to versioned docker images and
- register with Fusion to deploy
The diagram above describes a typical data science team’s workflow. The team first identifies the problem, takes data from various storages, uses Jupyter notebooks with Python ML libraries and performs iterations until a satisfactory version is produced. After that, uses simple commands build a docker image and publish to Fusion. Fusion needs one-time access setup to the organization’s private docker repository to register the image. Fusion can then deploy the models on demand at scale.
Using models at Query (search) and Index (data ingest) time
Case 1: Processing documents at index time.
When indexing documents from Sharepoint, GDrive or any other data source, Machine Learning models can enrich the document with Entities, Summary, Topics, Sentiment Scores etc.
Documents pass through the following flow: Fusion Connectors → Index Pipeline → Solr Index
Fusion’s Machine Learning Index Stage will interact with deployed ML models and pass documents/predicts back and forth between the pipeline and Seldon core.
The image above describes how the documents flow through different stages in an index pipeline getting enriched at each step before being stored. The Machine Learning Index Stage interacts with Fusion’s ML Service which then talks to Seldon Core. Seldon Core routes the requests to the respective models while load-balancing between model replicas. Finally the prediction from the model is returned back to the pipeline and the document is enriched with that prediction. Model replicas are copies of Model Docker images deployed to increase scalability.
Case 2: Processing User Queries
When processing user queries in real time (from the search front end either ecommerce website or internal knowledge discovery portal) queries can be passed through ML models to predict various user intent attributes such as, brand affinity, product category for the query is looking for, expected color etc.
Queries pass from: Front End → Query Pipeline → Solr Index. → Response → Front End
The diagram above shows how a user query travels through a Fusion query pipeline and the Machine Learning Query Stage interacts with Deployed ML Models passing queries/predictions back and forth between the pipeline and Seldon Core. The predictions can then be used as Solr Boost or Filter parameters. E.g. A model can predict department:electronics for query “ipad”.
Case 3: Post Processing search results at query time.
Responses to user queries, from the Fusion backend can also be modified to alter the ranking of the results, redact certain documents etc. to promote personally relevant results based on user information, show documents based on semantic similarity in addition to keyword search.
Queries pass through the following workflow: Front End → Query Pipeline → Solr Index
Machine Learning Query Stage will interact with Deployed ML Models and pass Response documents/predictions back and forth between the pipeline and Seldon core. The re-ranked / altered results can then be passed on to the front end. E.g. Certain models that do this are popularly known as LTR models (learning to rank).
Lucidworks has deployed multiple Deep Learning based ML Models on this framework, available for Fusion users out of the box.
- Sentiment analysis small text
- Sentiment analysis large text
- Semantic search apps
- Smart Answers (coming soon)
- Zero search results treatment (coming soon)
See Fusion in Action
If you want to learn more and see Fusion’s capabilities for data science in action, register for our upcoming webinar, Accelerate Data Science Velocity with Fusion 5.1.
Sanket Shahane is the Product Manager for Artificial Intelligence at Lucidworks. He manages the AI portfolio of Lucidworks’ enterprise search product Fusion. Leading teams of 10+ engineers, data scientists, UX designers. Sanket manages cross-functional teams, projects, and stakeholders to develop AI solutions.