When time saved means lives saved, rapid data discovery is literally vital.
You’d be hard-pressed to find a data discovery platform with a bigger impact in recent years than MetaBio. Serving the leading biotechnology company, Regeneron, MetaBio is a cutting-edge data contextualization engine that revolutionizes the way data is processed and utilized. Regeneron is a research-driven organization focused on developing life-saving pharmaceuticals for patients with serious illnesses like cancer and infectious diseases, including COVID-19. Their antibody cocktail was the first treatment to demonstrate statistically significant antiviral activity against the novel coronavirus.
Regeneron manages the entire lifecycle of creating life-saving treatments—from research and development to manufacturing. By leveraging the world’s largest and most diverse genomic database alongside extensive scientific literature and adhering to global regulatory standards, Regeneron formulates next-generation pharmaceuticals to treat patients worldwide.
At the heart of this mission is MetaBio, a platform designed to streamline and accelerate data discovery through automation. By organizing, standardizing, and compiling vast datasets, MetaBio empowers Regeneron’s research teams to make faster, more informed decisions. Under the hood, the Lucidworks Platform catalogs, searches, and connects the dots enabling users to find data.
MetaBio has a broad set of applications across the organization. It helps shorten the research time required by identifying and suggesting appropriate documents–among millions–to the topic at hand. It surfaces applicable governmental regulations. It identifies any events that could potentially have affected a drug manufacturing process. Researchers, clinicians, clinical development teams, and commercial units all benefit from the platform, enhancing productivity across the board.
Supporting a global user base with diverse needs—who require access to various resources, ontologies, and regulatory data from different governments in multiple languages—MetaBio must be both flexible and powerful.
“We’re not just delivering search results,” said Shahzad Ahmed, Senior Information Technology Engineer at Regeneron. “We’re pushing the boundaries to help users understand the value of the data, what insights can be drawn from it, and how it can advance their work.”
MetaBio is a full-stack, cloud-based solution that ingests, connects, processes, and analyzes data using machine learning, artificial intelligence, and deep learning. Data flows from a centralized data lake, which is connected to Lucidworks via a JDBC connector. Lucidworks indexes and curates the data, cleaning it and making it accessible to all other applications within Regeneron’s ecosystem.
The platform is made up of a comprehensive tech stack that includes DataIQ for contextualization, Apache Superset and Preset for data visualization, Privacera for data governance, Amazon Athena, Apache Hive, and Dremio among others. According to Ahmed, “We have a long list of applications and technology stacks created at Regeneron. Lucidworks fits nicely into the data ecosystem.”
MetaBio tailors the data experience to its users through a customized web app interface.
Determining which data is “contextually valuable” for each user is a continually evolving process. Currently, search queries and user profile information—like department and job title—play a central role in surfacing the most relevant results. As the platform progresses, personalization is planned to cater down to the specific individual.
Security trimming ensures that only authorized users can view sensitive documents or metadata, maintaining strict data governance across the platform.
Lucidworks optimizes result relevance by analyzing user behavior signals. Each action—whether clicking on the first result or scrolling through multiple pages—is tracked, allowing the platform to refine search results and better predict user intent. This enables MetaBio to provide a continuously improving search experience.
Additionally, users can bookmark, save searches, and adjust data source preferences to tailor results to their specific needs. For instance, a user working with “Compass” data can set it to appear at the top of their search results above all other sources.
In instances where sufficient behavior data isn’t yet available, MetaBio uses AI and synthetic data to simulate user behaviors and predict the actions of other user groups.
MetaBio currently supports several applications in production, including those for COVID-19 research and clinical trial feasibility assessments.
COVID-19 Search App
During the early days of the pandemic, Regeneron’s Regulatory Intelligence Group faced an overwhelming influx of information—research papers, new findings, and social media-driven trends on potential cures. Sifting through this massive volume of data to identify what was truly valuable was impossible for human analysts alone. MetaBio quickly filtered through the noise, enabling Regeneron’s teams to identify relevant data and make informed decisions on how to respond to the rapidly evolving situation.
Clinical Trials Research App
Before initiating a clinical trial, Regeneron conducts a feasibility assessment to review similar trials conducted elsewhere. MetaBio’s clinical trial app supports this process by providing researchers with quick access to relevant studies from around the world. For example, when a scientist working on multiple myeloma searches for related clinical trials in China, the platform surfaces over 36,000 relevant documents. Filters allow the scientist to refine this further, and custom visualizations help highlight trial success rates in specific countries.
Researchers can bookmark important documents and export them into Regeneron’s Appian workflow application, facilitating further analysis and collaboration.
MetaBio’s future development plans will extend its capabilities to Regeneron’s commercial side. By analyzing external sales data from pharmacies like CVS, Rite Aid, and Walgreens, Regeneron aims to gain deeper insights into its products’ market performance and inform its business strategy.
The unifying theme across all of MetaBio’s applications is the ability to expedite data discovery through a self-service, on-demand platform. This flexibility enables data exploration and accelerates decision-making across Regeneron’s diverse teams.
“One of the key selling points of Lucidworks that I’ve shared with leadership is its flexibility,” said Ahmed. “We collaborate closely with stakeholders to understand their challenges, and we can quickly build custom solutions that solve specific problems relying on Lucidworks capabilities.”
MetaBio’s significance has not gone unnoticed. In 2022, Regeneron received the prestigious CIO 100 award, which recognizes projects that “deliver business value through innovative technology use.”
The platform’s success is also reflected in user engagement. Since its launch, daily traffic to MetaBio has increased significantly, with improvements in relevance and usability. Metrics like click-through rates and successful search results have steadily risen, indicating that MetaBio is continuously refining its ability to meet the needs of Regeneron’s users.
As Regeneron’s founder and CEO, Leonard Schleifer, has said: “Our next product lies in our data—we just have to find it.” With MetaBio’s fast and efficient data discovery, the path to critical breakthroughs is clearer than ever. The platform is helping Regeneron get to the heart of the matter—quickly—allowing researchers to focus on what matters most: saving lives.