Pharma companies are in the business of data. Each step in creating a new treatment involves gathering data in order to improve process, efficacy, safety of a treatment, and how a drug is marketed, priced, or sold.

One of pharma’s primary challenges in drug creation is figuring out which compound might bind to which receptor in which system to produce the desired therapeutic benefits – and what might be the side effects. When a data scientist thinks about this type of problem, they see it as a classification problem, meaning the same kind of classification algorithm that finds products in a retailer’s online catalog or detects potential fraud in financial services can help find the next life-saving medication.

The Massachusetts Institute of Technology (MIT) uses a deep learning technique to identify markers of toxicity in a database of candidate molecules. Given that 90% of small molecules fail due to toxicity or efficacy in the first phases of research or development, anything like this that helps eliminate candidates before then is a significant cost benefit.

Other AI models can also be used to find candidate compounds and fragments to use in future drugs. Merck has a project called Atomwise that uses a similar technique to search billions of compounds for candidates that could be considered in treatments that are both effective and safe.

Consider the rewards of AI techniques like machine learning:

  • Find a one in a billion compound that is effective at treating a disease.
  • Discover a candidate is likely to be toxic in humans before filing an IND or a phase 1 clinical trial
  • Detect anomalies in clinical trials data in real-time.

The rewards are great but so are the challenges. Data in pharmaceutical development from databases to CROs is siloed and in different formats. Often times data is even tagged incorrectly and those errors propagate. Moreover, life sciences companies have invested heavily in technologies like Hadoop that have failed to deliver on their promise of providing accessible data across the pharmaceutical research and development lifecycle. Instead, these solutions deliver slow, inscrutable access for only a few people at a time and usually with a data-engineer required nearby.

The next steps for data in life sciences are:

  • Using proven data technologies that make storing and accessing data fast, efficient, and simple.
  • Using AI technologies not just to find molecules but to find errors in the data itself.
  • Deploy made-for-purpose tools that offer visualized access across the discovery, development, and commercialization processes.

By doing this, users across the discovery, development, and commercialization processes will be able to view data they understand. Data quality will improve over time. Access will be cheaper, faster and more efficient.

In a connected life sciences company, the next life saving drug may be found first by an algorithm before being vetted by a researcher. The next Torcetrapib might be eliminated early in phase 1 or maybe before even the first rat has tasted it. Whether it is a new discovery or cutting losses on a malignant molecule — data could save lives and dollars.

Learn More: