Artificial Intelligence in Drug Discovery and Biotech; 2022 Recap and Key Trends

The advent of AI in drug discovery at a glance

The current advent of artificial intelligence (AI) is shaping the evolution of entire industries, including the pharmaceutical and biotech industries. Unsurprisingly, almost every large and small Life Science organization has shown keen interest in adopting AI-driven discovery platforms in the hope of streamlining R&D efforts, reducing discovery timelines and costs, and improving efficiency.

All of the largest pharmaceutical companies, such as J&J, GSK, AstraZeneca, Novartis, Pfizer, Sanofi, Eli Lilly, and others, have made significant investments in AI technology, including equity investments, acquisitions of, or partnerships with, AI-focused companies, building internal capabilities, or a combination of approaches.

At the same time, there is a wave of new kinds of drug discovery and biotech companies built as AI-centric organizations, often from day one. Having been founded, for the most part, within the last decade, such companies have already built and tested specialized AI-driven drug discovery platforms -- often including dozens of machine learning models -- and now are starting to reap the rewards in the form of fast and cost-effective target discovery and drug design capabilities, yielding preclinical and clinical drug candidates in a fast manner. Below we will be discussing a cohort of AI-developed drug candidates -- small molecules, biologics, and other modalities -- which have already entered clinical trials or are about to do so.

Other AI companies can model biology using complex multimodal data at scales not imaginable some twenty years ago. Yet another group of companies developed AI-driven platforms to boost operational efficiency and experiment design of clinical trials or real-world data analysis (e.g., pharmacovigilance).

Big-tech companies, such as Alphabet, Microsoft, Amazon, IBM, and Tencent, which have competency and expertise in AI and big data technologies, are also making a foray into the drug discovery space -- by investing, founding startups, partnering with life science companies, experimenting, innovating…

Finally, there is significant progress in other cutting-edge technologies -- quantum computing, Cryo-EM, DNA-encoded libraries, etc.-- which are converging with the artificial intelligence trend to output not only new types of tools, products, and services but also a wave of new startups and even novel business models.

What is AI, and how can it boost drug research?

Artificial intelligence is a relatively old concept, formalized at a famous Dartmouth College conference in 1956. The AI technologies in drug discovery have evolved from earlier machine learning (ML), cheminformatics, and bioinformatics concepts and approaches. For example, the application of machine learning to developing quantitative structure-activity relationship (QSAR) models and expert systems for toxicity prediction has a long history.

However, the rapid (in some cases -- “exponential”) advent of big data, advanced analytics, minimizing the cost of computation, GPU acceleration, cloud computing, algorithm development (e.g., deep neural nets and large language models), and the “democratization” of AI technology -- all led to a synergistic “boom” in commercializing and industrializing artificial intelligence, in particular, in the pharmaceutical and biotech industries.

In this white paper, we use the collective term “artificial intelligence” to refer to any sophisticated computational and modeling systems which can automatically learn insights and derive practical suggestions from “big data,” structured and unstructured data, also multimodal data.

While there is no limit to a particular family of algorithms that we refer to as “artificial intelligence,” we, in most cases, imply various flavors of machine learning-based systems (primarily deep neural networks) and large natural language processing (NLP) models. Modern AI systems can learn without being explicitly instructed (in contrast to traditional cheminformatics software within “if-then” logic), they can improve accuracy after new learning cycles and when more data is fed to the system, and -- most notably -- they can process high dimensionality multimodal data of enormous size. All such attributes are what significantly differentiate modern-day artificial intelligence systems from legacy cheminformatics and bioinformatics software packages. Such abilities are at the center of what drives the ongoing excitement about AI (and hype).

While some components of what we call “artificial intelligence” -- e.g., machine learning tools and language models -- are used by pretty much every pharmaceutical organization and academic lab, some companies managed to build sophisticated computational and modeling pipelines, research “AI platforms,” which include automated workflows across dozens and even hundreds of various models and systems (deep learning, language models), and hundreds of various public and proprietary data sources.

The high sophistication and automation of some AI platforms led to their “commoditization” to the point they have trademarked commercial names. At the same time, some of them are offered as software-as-a-service to other companies. Examples include mRNA DESIGN STUDIO™ by Moderna, Centaur Chemist® by Exscientia, Guardian Angel™ by AI Therapeutics, ConVERGE™ by Verge Genomics, Taxonomy3® by C4X Discovery, and many others.

Below is an example of Pharma.AI by Insilico Medicine, a modular system for end-to-end drug discovery that comprises hundreds of different sub-systems and machine learning models -- altogether controlled by yet other algorithms of higher modeling abstraction (via a principle of “ensemble learning”).

A scheme of Pharma.AI end-to-end platform.

Artificial intelligence is widely used in almost every aspect of pharmaceutical research, from data mining, biology modeling, and target discovery to lead identification and preclinical and clinical research. It is also used for synthesis planning, intelligent search for reagents and research consumables, and auxiliary tasks such as smart laboratory notebooks and virtual assistants.

The Life Science ecosystem of AI adopters includes the following major categories of players:

400+ AI-driven companies (startups/scaleups), offering a wide array of AI-driven platforms and services -- from classical Software as a Service model to custom data science services, drug discovery (“Drug candidate-as as service”), and clinical trial support/management resources.

Domain-specific software providers (e.g., KNIME, ChemAxon, Dotmatics, MolSoft, and others) primarily focus on cheminformatics/bioinformatics software but also provide machine learning-powered tools.

Top-tier pharmaceutical and biotech companies developing in-house AI expertise as part of their R&D strategy. Such players often collaborate with external AI vendors and AI-driven biotech startups to explore pilot programs in drug discovery/basic biology/clinical trial analytics.

Top-tier technology companies like Google, Amazon, and Tencent entering the pharmaceutical space, leveraging cutting-edge AI technologies and big data infrastructures.

Contract research organizations (CROs) developing expertise in AI to augment their value offering to pharma/biotech customers.

Academic labs in pharma/biotech space, conducting AI research and developing specialized frameworks and tools relevant to the industry (usually a cradle for future AI startups/spin-outs).

Non-domain-specific software providers developing AI-as-a-service packages and models suitable for application in pharmaceutical research (e.g., “out of the box AI”)

Open-source machine learning tools and frameworks, widely exploited by life science professionals in their research projects.

AI drug discovery investment landscape, 2022

After 2021, the anomalously successful year for the biotech industry in terms of the amount of venture capital deals, the record number of initial public offerings, an abundance of successful exits, and a generally very positive climate in the stock market, the year 2022 demonstrated significant cooling down of financial activity and outright poor performance of the stock market.

However, artificial intelligence in the drug discovery sector demonstrated certain resilience, at least in the private equity transactions landscape, with several companies raising hundreds of millions in venture capital. Some examples include Beijing-based MegaRobo Technologies ($300 million Series C), Massachusetts-based ConcertAI ($150 million Series C) and Celsius Therapeutics ($83 million Series A), Hong Kong-based Insilico Medicine ($95 million Series D), California-based BigHat Biosciences ($75 million Series B) and DeepCell ($73 million Series B), and several others -- read “Major VC Rounds For AI Companies in Drug Discovery and Biotech in 2022”.

A merger and acquisition (M&A) landscape was marked by a recent notable deal involving a biotech giant Ginkgo Bioworks acquiring Zymergen in a transaction valuing Zymergen at $300 million. The acquisition brings Zymergen’s machine learning and data science capabilities together with Ginkgo’s synthetic biology platform.

Key industry observations and trends

The advent of AI and data technologies, as well as novel computational tools and infrastructural solutions (databases, cloud services, etc.), are all redefining the way the pharmaceutical industry is operating -- on research, clinical, and business levels. Below let us review some of the trends and observations in the AI for drug discovery space and illustrative industry developments in 2022.

AI-enabled biology modeling and target discovery

In drug discovery research, identifying novel drug targets is critical for developing novel first-in-class therapeutic drugs -- potential “blockbusters.” Drug discovery efforts over several past decades centered, traditionally, around targeting specific proteins with suitable “pockets” to be influenced by a ligand molecule (often, a small molecule). But out of the entirety of all human proteins (aka “proteome”), a small number of proteins were explored as targets. There are currently 20,360 human proteins in Swiss-Prot, of which approximately 4,600 are known to be involved in disease mechanisms according to the OMIM database, representing around 22% of human proteins with roles in disease. These proteins are the obvious region of the human proteome likely to contain viable drug targets. However, as of 2017, only around 890 human and pathogen-derived biomolecules (mostly proteins) were actually utilized by the existing FDA-approved drugs. These biomolecules included 667 human-genome-derived proteins targeted by drugs for human disease. Things are not much different today, so there is still a lot of room for identifying novel targets in this pool. Novel computational approaches based on artificial intelligence technologies allow for identifying new druggable protein pockets at scale, sometimes allowing for proteome-wide virtual screens.

But what is even more exciting, advanced modeling tools help identify and modulate novel types of targets, such as protein-protein interactions, targets with large contact areas, protein-nucleic acid interactions, and next-generation targets, such as exploiting the cell’s protein degradation machinery.

A lot of AI-driven companies are focused on modeling biology, discovering and validating novel targets and offering “disease model-as-a-service” or “target-discovery-as-a-service” to other organizations. Demand for this kind of contract research services is rising which is reflected in the growing number of target discovery partnerships.

For example, in September 2022, an Israeli-based biology modeling company CytoReason announced an expanded $110 million collaboration with Pfizer. The two companies started working together in 2019 when Pfizer started using CytoReason’s biological models in research aimed at developing new drugs for immune-mediated diseases and cancer immunotherapies.

In May 2022, AstraZeneca announced that it collected a second pulmonary fibrosis target from its collaboration with BenevolentAI, a UK-based leader in AI-driven drug discovery. The milestone marked a third novel target discovered by BenevolentAI for AstraZeneca since the collaboration started in 2019. Just several months later, in October 2022, BenevolentAI managed to deliver two additional AI-generated targets for AstraZeneca’s R&D portfolio, aimed at chronic kidney disease and idiopathic pulmonary fibrosis.

In November 2022, Hong Kong-based Insilico Medicine signed a potentially $1.2 billion-worth deal with Sanofi for discovering up to six new targets leveraging Insilico Medicine’s “Pharma.AI” platform.

While such cutting-edge algorithms as deep neural networks require large volumes of data to properly model biology, there are targets with a little amount of data available. A Canada-based Cyclica developed an AI-driven platform for polypharmacology and proteome-wide screening, capable of working with “low-data” targets. In November 2022, Cyclica received a $1.8 million grant from Bill & Melinda Gates Foundation to apply its AI-enabled drug discovery platform to discover new non-hormonal contracts, leveraging multiple low-data biological targets.

As per the BiopharmaTrend report, there are at least 182 other AI companies in the target discovery space, including leading well-funded companies with cutting-edge R&D platforms, such as Insitro, Relay Therapeutics, Valo Health, and others.

New “AI-native” startups are constantly emerging in the biology modeling space. For example, CardiaTec Biosciences, WhiteLab Genomics, Degron Therapeutics to name a few.

All in all, advanced modeling methods based on artificial intelligence help redefine the very definition of biological targets, as we try to link drug response to genetic variation, understand stratified clinical efficacy and safety, rationalize the differences between drugs in the same therapeutic class and predict drug utility in patient subgroups.

Cracking structural biology with AI

One of the most discussed AI-related topics in the Life Sciences community this year was the recent success of Alphabet’s UK-based subsidiary DeepMind, which received widespread coverage for its success in cracking protein folding problem, a half-century-old biological problem.

In July 2022, DeepMind’s deep learning software AlphaFold predicted and publicly shared protein structures of over 200 million proteins, having demonstrated the astonishing ability of its AI system to accurately predict 3D structures just from its 1D amino acid sequence. While some argue that this discovery may not (just yet) have such a transformative role in drug discovery as one may assume, and that AlphaFold did not perform much better than chance when predicting bacterial protein-antibacterial compound interactions, the discovery is certainly paradigm-changing for both structural biology and illustrating the potential of AI in basic biology research.

In November 2022, DeepMind’s groundbreaking success in modeling the proteome was rivaled by researchers at Meta (formerly Facebook, headquartered in Menlo Park, California). It used AI to predict the structures of some 600 million proteins from bacteria, viruses, and other microorganisms that haven’t been characterized.

The scientists at Meta used an entirely different AI approach -- using a ‘large language model’, a type of AI that can predict text from just a few letters or words. Natural language models (NLPs) are usually trained on large volumes of text. However, 1D protein sequences are essentially strings of letters, so NLPs can be applied to such problems similarly to working with human languages.

Interestingly, such major technological leaps in protein folding might turn out to be more useful for de novo protein design, than simply modeling structures of existing proteins for drug discovery. Time will tell where the impact will be the biggest, but the above successes by DeepMind and Meta are not the only exciting development for structural biologists in 2022.

Recently, the rapid advancements in cryo-EM, coupled with AI technologies, gave birth to a new wave of biotech startups such as Gandeeva Therapeutics, Septerna, and MOMA Therapeutics. The cryo-EM field is heating up with biotech start-ups attracting the attention of a wide range of investors, from the smaller venture organizations to the owner of TikTok and internet tech giant ByteDance, investing in Shuimu BioSciences. The interest is driven not only by the revolutionary Nobel prize-winning technology but also by the active recruitment of AI into the process. The recent publication “An AI-assisted cryo-EM pipeline for structural studies of cellular extracts” highlighted the non-replaceable role of AI in complex cryo-EM pipelines, including AI-driven atomic model prediction to rapidly and simultaneously investigate the structure of multiple protein community members de novo. Machine learning helps not only speed up and optimize the cryo-EM pipeline but to also avoid user bias pitfalls.

Gandeeva Therapeutics, founded in 2021, raised $40M at the beginning of this year to develop novel therapies based on the precision imaging of protein-drug interactions. Their Target-Selection Engine together with the Cryo-EM Engine can help to “steer away from discovery dead ends”, as the company stated. At the same time, launched in 2020 cryo-EM biotech MOMA Therapeutics raised a whooping $236M in just two years, having an ambitious goal of releasing to clinic novel precision drugs for cancer. MOMA is focused on a unique class of biological targets -- “molecular machines.”

Developing small molecules using AI

After disease modeling and target discovery, designing chemical or biological molecules is the second most abundant use case for applying artificial intelligence in drug discovery. More than 130 artificial intelligence-driven companies out of 384 companies in the BiopharmaTrend AI Report apply artificial intelligence for designing drug candidates, among other use cases.

AI-driven drug design falls mainly into three major categories: de novo (e.g., generative) drug design, virtual screening of existing databases, and drug repurposing.

De novo drug design is mostly enabled by deep learning models, such as generative adversarial neural networks (GANs). Some examples of generative AI platforms include Chemistry42 software by Insilico Medicine, Makya by Iktos, and De Novo Platform by Ro5. Other player in this category include Recursion Pharmaceuticals, Deep Cure, Standigm, and others.

The application of artificial intelligence-enabled ultra-large-scale virtual screening, sifting through billions of molecules to find successful hits. In August 2022, Sanofi partnered with Atomwise in a drug design deal worth potentially up to $1.2 billion. The deal, which will see Sanofi pay $20 million upfront, centers on leveraging the U.S. company’s AtomNet platform to research small molecules for up to five drug targets selected by Sanofi. A convolutional neural network-based AtomNet excels at structure-based drug design, enabling “the rapid, AI-powered search of Atomwise’s proprietary library of more than 3 trillion synthesizable compounds,” according to the announcement.

Earlier in 2019, Atomwise collaborated with Ukraine-based chemical leader Enamine to conduct the “world’s first and largest 10 billion compounds virtual screen,” aiming at identifying hits for pediatric oncology.

Finally, a number of companies are using repurposing strategies for AI-enabled drug discovery. Companies in this category, including Healx, BenevolentAI, BioXcel Therapeutics, are largely using natural language processing (NLP) models and machine learning and operate via analyzing massive amounts of unstructured textual data -- research articles and patents, electronic health records (EHRs), as well as other data types -- to build and search “knowledge graphs.” Such AI-enabled searchable ontologies allow picking novel indications or patient populations for previously known drug candidates or even approved drugs.

For example, Lantern Pharma, a US-based clinical-stage biotechnology company, focused on innovating the cancer drug development process by using advanced genomics, machine learning, and artificial intelligence.

The company’s AI platform, RADR®️, currently includes more than 25 billion data points and uses big data analytics and machine learning to rapidly uncover biologically relevant genomic signatures correlated to drug response, and then to identify relevant cancer patient subgroups to benefit from Lantern’s drug candidates. RADR ®️ is also used by Lantern and its collaborators to develop and position new drugs as well as for drug repurposing.

AI meets DNA-encoded libraries

A somewhat unique approach to drug design consists in using DNA-encoded libraries (DELs) as a source of novel molecules to search through. Since DEL technology offers access to essentially the largest chemical space available on the market, this big data technology is a natural fit for AI-based tools.

A notable deal took place in 2020, when Insitro, one of the notable players in the application of machine learning for drug discovery, founded by Daphne Koller, acquired Haystack Sciences. Haystack’s machine learning-based platform combined multiple elements of their DEL technology, including the capability to synthesize broad, diverse, small molecule collections, the ability to execute iterative follow-up, and a proprietary semi-quantitative screening technology called nDexer™, that generates higher resolution datasets.

In its turn, ZebiAI was acquired in 2021 by another notable developer of an artificial intelligence-powered drug discovery platform, a clinical-stage biotech Relay Therapeutics, where Relay paid $85 million up-front. This acquisition allowed Relay to incorporate ZebiAI’s machine-learning-based DEL technology into their protein targeting platform Dynamo.

In October 2021, X-Chem acquired Glamorous AI, a developer of a modular multifaceted artificial intelligence solution for drug discovery RosalindAI, including capabilities of data engineering and featurization, predictive analytics, high-performance computing, and de-novo drug design.

AI-driven drug design beyond small molecules

Considering that modern-day artificial intelligence tools applied for drug discovery have deep historical roots in cheminformatics and early machine learning-based QSAR models of the past century, it is not surprising that the overwhelming majority of AI startups in drug discovery are focused on small molecules.

Distribution of AI drug discovery companies by product category.

However, biomolecule drugs (aka “biologics”) and novel chemical modalities are increasingly abundant in the pharmaceutical space, and so are the new biotech companies applying AI-based methods to discover those. After scientists cracked the human genome in 2003, the druggability and developability space rapidly evolved. In the past century, Lipinski’s rule-of-five (Ro5) used to serve as a “guiding light” for drug-like molecule design for oral delivery in the “traditional” druggable target space.

In contrast, novel types of targets, such as protein-protein interactions, targets with large contact areas, protein-nucleic acid interactions, and next-generation targets, such as exploiting cell’s protein degradation machinery, are driving the advent of a variety of emerging molecular modalities, namely beyond the Ro5 (bRo5) small molecules (such as protein-protein interaction modulators, protein-targeted chimeras (PROTACs), monoclonal antibodies (mAbs), peptides and peptidomimetics, and nucleic acid-based modalities (RNA and DNA-based), have become a key focus in drug discovery.

For instance, there is a growing number of companies applying AI methods to discover novel monoclonal antibodies -- the most commercially successful biologics modality so far. Notably, in April 2022, Israel-based Biolojic Design announced their first ever computationally designed antibody entered the clinical trial. The company leverages a structure-based design strategy. Its AI model is trained on millions of antibody-antigen pairs to identify a template antibody against the target of interest from existing human antibodies. An additional machine learning model is used to predict mutations and guide the optimization of the template to improve its properties.

In November 2022, Canada-based AbCellera Biologics announced that Regeneron elected to exercise its right to advance the first of AbCellera’s therapeutic antibody candidates targeting an undisclosed G-protein coupled receptor (GPCR) into further preclinical development. The partnership, which commenced in March 2020 and allows for four discovery programs selected by Regeneron, leverages AbCellera’s AI-based antibody discovery engine and Regeneron’s VelocImmune® mice to identify novel therapeutic antibodies.

Two dozen other antibody-discovering companies are using AI, including US-based AbSci, BigHat Biosciences, Totient, Nabla Bio, and Generate Biomedicine; Canada-based Deep Biologics; China-based NeoX; EU-based Deep CDR, Natural Antibody, and MabSilico, etc.

US-based company with a catchy name Creyon Bio applies an engineering approach to creating new oligonucleotide-based medicines (OBMs). The company was founded in 2019 and raised $40M in funding in March 2022. Founded in 2014 as a spinout of Cold Spring Harbor Laboratory, Envisagenics is a New York-based company focusing on discovering RNA therapeutics. According to their stated mission, they aim to reduce the complexity of biomedical data with the help of AI/ML technologies. Just recently, in August 2022, they received a grant from the National Cancer Institute, resulting in a total raised funding of $27.1M.

Envisagenics’s AI-driven technology, SpliceCore, is a cloud-based platform experimentally validated to predict drug targets and biomarkers through splicing discovery from RNA-sequencing data. According to the company, it ensures higher precision and speed compared to traditional methods.

Innophore’s AI-driven strategy to design novel therapeutic enzymes is realized by coupling their patented Catalophore™ technology to state-of-the-art conventional bioinformatics approaches and artificial intelligence. Innophore can mine structural and sequence databases using three-dimensional (3D) search templates called “catalophores” (i.e., carrier of the catalytic function) defined by point clouds of physicochemical features. Novel enzymes identified by this technique do not necessarily share a common structure or sequence basis with their employed counterparts. Therefore, they potentially feature altered protein properties, such as thermostability, robustness, substrate spectrum, selectivity, and specificity.

Besides designing novel enzymes, Innophore's technology can potentially be a game changer for epidemiologic applications, protecting potentially dangerous mutations in viruses. In 2021, Innophore started the virus.watch project in cooperation with the AWS Diagnostic Development Initiative. The goal of this project was the implementation of a monitoring and evaluation system for emerging drug and disease-relevant Coronavirus (SARS-CoV-2) variants. The first joint paper, published in Nature in August 2022, describes bioinformatics analysis of SARS-CoV-2 variants revealing higher hACE2 receptor binding affinity for Omicron B.1.1.529 spike RBD compared to a wild-type reference.

Tracking the evolving virus over time using Innophore technology and AWS shows a high rate of mutations arising with the Omicron variant. Spheres depict alpha-C-atoms of the corresponding amino acid residue. Both color and size correlate with the number of mutations at each position.

Founded in 2008, Denmark-based Evaxion Biotech is an AI-driven company, devoted to developing vaccines against cancer and infectious diseases. They own a clinical-stage AI-Immunology platform, combining AI technology with their engineering expertise to generate predictive models, helping to identify unique immunotherapies for patients. Evaxion Biotech attracted a total of $57M, entering the post-IPO equity funding round in June 2022 worth $40M, led by a single investor Lincoln Park Capital Fund.

Some AI companies from the “chemical modalities club,” like Exscientia, are now expanding into biologics discovery. In November 2022, the company announced its AI platform would include the design of human antibodies. Exscientia is also establishing an automated biologics laboratory in Oxford to internally generate and profile novel antibodies.

A growing trend is to exploit the protein degradation system of human cells to get rid of malignant proteins and cure diseases. One modality here that is rising in popularity is proteolysis targeting chimera (PROTAC) was introduced in 2001, and it consists of two ligands connected by a flexible linker. The primary chemical architecture of modern PROTACs is the same: one ligand targets the E3 enzyme, which is a component that sends outdated proteins to the proteasome, and another ligand targets a protein of interest (POI) that has to be degraded. A PROTAC binds E3 and POI, bringing them closer to form an induced proximity complex. In some cases, when the proteins align appropriately, the POI gets ubiquitinated, which marks it for degradation by the proteasome.

Another broad approach to protein degradation includes so-called “molecular glues,” an actively growing area of research. In contrast to PROTACs, being relatively large bifunctional small molecules with two active sites and a linker, molecular glues are smaller and more drug-like molecules. The latter bind to an aggregate protein pocket resulting from two separate proteins coming into proximity due to the effect of the molecular glue molecule.

There is a wave of companies within the protein degradation (and, more broadly -- modulation) space, including Arvinas, Nurix Therapeutics, Kymera Therapeutics, C4 Therapeutics, Roivant Discovery, Cedilla Therapeutics, and Lycia Therapeutics, to name a few.

Some companies are applying cutting-edge AI algorithms to design proximity-inducing compounds. For instance, Austria and US-based Celeris Therapeutics has built Celeris One platform, including three work zone systems: Xanthos, Hephaistos, and Hades. The systems incorporate graph neural networks to predict interactions and generative models to create new chemical matter, such as linker and multi-objective optimization to improve molecular properties, molecular dynamics, and free-energy calculations. The workflow also employs geometric deep learning and machine learning-driven retrosynthesis capabilities. Celeris Therapeutics runs an automated lab to generate biology data and conduct custom chemical synthesis.

The dry lab workflow of Celeris Therapeutics' AI-driven platform Xanthos.

We have recently published a broad overview of the protein degradation market in a post Protein Degraders Take Industry By Storm, including several case studies with a technical overview of the computational platforms involved.

The first wave of AI-developed drug candidates goes clinical

While it is probably early to say that AI adoption in the pharmaceutical industry revolutionized drug discovery altogether, several “AI-native” companies did manage to gain notable efficiency in building their therapeutic pipelines quickly. What is one common feature of such companies? Each built a specialized, highly integrated AI platform, including many models and data sources. Some platforms are also available as software-as-a-service to external R&D partners, such as Chemistry42.

One of the most vivid examples of benefiting from a “digital-first” strategy the industry has seen is Moderna Therapeutics, which not only managed to incorporate cutting-edge AI analytics in its research but digitalized and integrated every aspect of its R&D workflow, including production and distribution. When the COVID-19 pandemic struck the world at the beginning of 2020, Moderna was among the first companies to be able to come up with an efficient mRNA-based vaccine within just 2 days (!) and bring it to the market within a year.

A wave of therapeutics discovery successes enabled by AI demonstrates the ability of AI-native companies to come up with drug candidates faster than it typically used to take for similar programs.

AbCellera’s monoclonal antibody LY-CoV555 was developed within three months and obtained emergency use authorization by the FDA.

BenevolentAI’s Knowledge Graph helped the company identify Baricitinib as an efficient COVID-19 antiviral within a matter of days (now approved for use by the FDA). Another small molecule BEN-8744, a novel inhibitor to treat Ulcerative colitis and Dermatitis, was advanced to late preclinical studies within less than 24 months.

Exscientia’s small molecule inhibitor EXS-21546 marked the first AI-designed molecule for immuno-oncology to enter human clinical trials (now in Phase I) and was discovered in just eight months. The company has several other molecules in clinical trials.

Insilico Medicine’s small molecule inhibitor ISM001-055, to treat Idiopathic Pulmonary Fibrosis, was de novo designed and advanced into late preclinical studies within 18 months (now in Phase I).

New York-based Schrodinger developed a small molecule SGR-1505 to treat B-cell lymphoma within ten months and is now in the process of IND application.

Salt Lake City-based Recursion Pharmaceuticals developed a drug candidate for an unspecified rare disease within 18 months. The company has a large and diverse portfolio of preclinical and clinical drug candidates designed with the help of its digital biology platform.

Toronto-based Deep Genomics used its AI Workbench platform to develop a novel genetic target and a corresponding oligonucleotide drug candidate DG12P1 to treat a rare inherited Wilson's disease.

To keep track of the leading AI-developed clinical drug candidates, we have created “The Roadmap of Drug Candidates Designed by AI,” which will be updated regularly.

Twenty most “productive” AI companies in the drug discovery space

Having shortlisted around 130 companies from more than 380 AI companies in the BiopharmaTrend AI Report, we have further selected 20 companies -- using a simple but robust evaluation formula taking into account clinical and preclinical pipelines of companies, the ability for target discovery, and the time in business. The 20 selected companies formed the BPT20: Artificial Intelligence in Drug Discovery Productivity Index -- the industry’s first point of reference to highlight companies championing the application of AI for de novo drug design, virtual screening, or drug repurposing.

AI and robotized labs of the future

Deep learning models (e.g., based on deep neural nets) are extremely “data-hungry,” meaning that no matter how good AI is, it is the quality and size of data that is equally important for meaningful research predictions. The most efficient way to generate high-quality biology data is by using robotics. If we consider the modern AI-driven transformation of drug discovery as a step-by-step process, widely available and relatively cost-efficient robotics-as-a-service would be the final and critical piece in the AI-enabled industrialization of pharma and biotech research. As per a report by Arctoris, “Robotics is key to allowing the paradigm of closed-loop discovery to become a reality - which will be an exciting space to watch over the coming years.”

Some companies are building standardized, highly automated, scalable, and increasingly compatible laboratory facilities guided by AI-based experiment control systems and supplemented by AI-driven data mining and analytics capabilities. Such “next-gen” lab facilities are becoming available remotely to preclinical drug research experimentalists, making preclinical experimentation a more scalable and standardized routine. The leading remote lab providers on the list are Automata Labs, Strateos, Emerald Labs, and Culture Biosciences, to name a few.

The space is attracting venture funding and clients. For example, in February 2022, UK-based Automata Labs raised $50 million to automate the lab research process. In July 2021, Strateos raised $56 million for further improving its SmartLab platform and its remote robotized, automated technology, available to preclinical researchers across the globe. Culture Biosciences raised a total of more than $100 million, with the latest $80 million Series B announced in November 2021. San Francisco-based Emerald Cloud Labs (ECL) raised more than $90 million over the years. Early users of ECL’s remove robotic platform reported 300% to 700% improvements in research productivity. In June 2022, Beijing-based MegaRobo raised $300 million to expand its diverse range of automated AI-driven remote lab services and robotized facilities.

The rise of remote robotized labs is a long-term industry trend, a new way to offer contract research services that would be extremely beneficial for the long-term adoption of data-centric “AI-first” research strategies.

Several AI-driven drug discovery companies, such as Arctoris, Recursion Pharmaceuticals, Insitro, and Generative Bio, are approaching this trend via a different business model -- they have built internal robotized lab facilities to improve their in-house data generation capacities for training their AI models and building pipelines of therapeutic drug candidates.

For example, Oxford-based Arctoris, founded in 2016, built a fully automated wet lab that generates superior quality data at scale, feeding into Arctoris’s data lake and powering the company’s AI-driven decision-making platform Ulysses that is powering the company’s research from target to hit, lead, and to IND application stage.

Arctoris’s pipeline now includes several preclinical programs in Oncology and Neurology. Arctoris raised a total of $10.3 million in several rounds from investors, including Future Planet Capital, RT Ventures, and Formic Ventures.

Some leading AI drug discovery companies, such as Exscientia and Insilico Medicine, are now also building in-house robotized labs for building their internal data generation “muscles.”

Salt Lake City-based Recursion Pharmaceuticals is among the leaders in robotized biology experimentation space. The company’s AI-driven infrastructure, called Recursion Operation System, is an integrated closed-loop system combining proprietary in-house data generation and advanced computational tools to generate novel insights to initiate or accelerate therapeutic programs. The company is automating preclinical biology experimentation at scale. For instance, cellular microscopy images capture composite changes in cellular morphology and are processed by the company’s AI-powered computer vision systems. Since 2017, Recursion Pharmaceuticals has approximately doubled the capacity of the phenomics platform each year and scaled the number of executed phenomic experiments to up to 2.2 million every week, resulting in ~19 petabytes of proprietary high-dimensional data.

Navigating clinical trial bottlenecks with AI

The clinical trial is a critical stage of drug development workflow, with an estimated average success rate of about 11% for drug candidates moving from Phase 1 towards approval. Even if the drug candidate is safe and efficacious, clinical trials might fail due to insufficient financing, insufficient enrollment, or poor study design.

Artificial Intelligence (AI) is increasingly perceived as a source of opportunities to improve the operational efficiency of clinical trials and minimize clinical development costs. Typically AI vendors offer their services and expertise in three main areas. AI start-ups in the first area help to unlock information from disparate data sources, such as scientific papers, medical records, disease registries, and even medical claims, by applying Natural Language Processing (NLP). This can support patient recruitment and stratification, site selection, and improve clinical study design and understanding of disease mechanisms. As an example, about 18 % of clinical studies fail due to insufficient recruitment, as a 2015 study reported.

Another aspect of success in clinical trials is improved patient stratification. Since trial patients are expensive - the average cost of enrolling one patient was $15,700-26,000 in 2017 -- it is essential to be able to predict which patient will have more significant benefit or risk from treatment. AI-driven companies operate with multiple data types, such as Electronic Health Records (EHR), omics, and imaging data, to reduce population heterogeneity and increase clinical study power. Vendors could use speech biomarkers to identify neurological disease progression, imaging analyses to track treatment progression, or genetic biomarkers to identify patients with more severe symptoms.

AI is also streamlining the operational processes of clinical trials. AI vendors help to track patient health from their homes, monitor treatment response, and patient adherence to the trial procedures. By doing that, AI companies decrease the risk of patient dropouts, which accounted for 30% on average. Usually, the Phase 3 clinical study stage requires 1000-3000 participants, with a part of them taking a placebo. That’s why the development of synthetic control arms - AI models that could replace the placebo-control groups of individuals, thus reducing the number of individuals required for clinical trials - might become a novel trend.

There are more than 80 companies in all three categories, as per BiopharmaTrend AI Report, including Owkin, PathAI, GNS Healthcare, Neurcuit, AICure, and Unlearn.ai.

The demand for AI-enabled clinical trial platforms is high, as well as investments in this area, despite the overall cold investment climate in biotech.

In March 2022, ConcertAI got a valuation of $1.9 billion after banking a $150 million series C round to scale its software and real-world data (RWD) solutions for cancer research.

Saama is a Silicon Valley-based company founded in 1997, but it raised its first venture capital in 2015. The company has raised more than $500 million in venture capital, including the latest mega-round of $430 million in August 2022 -- from Carlyle and venture funds from Merck, Pfizer, Amgen, McKesson, and others.

Saama is one of the leading players in the AI-driven clinical trial analytics space, offering a diverse suite of solutions: accelerated clinical trials via centralized data analytics and control center, including real-time data processing capabilities; automated data quality capabilities; streamlined regulatory submission capabilities, including pharmacovigilance analytics and submissions.

In April 2022, Unlearn.AI, a startup developing a ‘digital twin’ service for clinical trials, raised $50 million.

In June 2022, Bristol Myers Squib invested $80 million in OWKIN – to help enhance the design of cardiovascular drug trials, with improvements to endpoint definitions, identifying patient subgroups, and estimating treatment effects. Paris and New York-based “unicorn” OWKIN is leveraging our high-quality multimodal data access and state-of-the-art machine learning to accurately predict various treatment effects on patient sub-populations to improve clinical trial experiment design and outcomes. OWKIN is also applying its AI platform for drug discovery.

In August 2022, Bristol Myers Squibb also announced a multi-year expanded collaboration agreement with AI pathology specialist PathAI. The initial work within this extended agreement will focus on key translational research in oncology, fibrosis, and immunology, with an overall goal of forwarding these into clinical trials. Two months earlier, PathAI struck a strategic multi-year partnership with GlaxoSmithKline to accelerate scientific research and drug development programs in oncology and non-alcoholic steatohepatitis (NASH) by leveraging PathAI’s technologies in digital pathology, including the use of PathAI’s AIM-NASH tool.

Notably, Dublin’s Akkure Genomics just announced it crowdfunded €1 million in one week to support clinical trials via its AI platform, which helps people participate in the most relevant clinical trials based on data about themselves and their condition.

AI in the contract research industry

The emergence of novel AI-native contract research companies in pre-clinical and clinical spaces challenges the status quo of major well-established contract research organizations (CROs). They respond by incorporating AI in their service offerings to pharma or partnering with AI companies to complement their research capacity.

For example, Charles River Labs, a US-based early-stage contract research organization, is diving deeper into AI by establishing a multiyear partnership with Valo Health. Charles River adds Valo’s Opal technology that actively learns as programs are developed. Charles River hopes its use of the Opal deep learning platform will result in a faster and more effective process from de novo molecule design through lead optimization. Last year, Charles River established a strategic partnership with Valence Discovery that lent the CRO’s clients access to Valence’s artificial intelligence platform for molecular property prediction, generative chemistry, and multiparameter optimization.

IQVIA has been investing in AI capabilities for years to add value to clinical trials and commercial activities it is offering to customers. To improve clinical trials, for example, IQVIA launched Avacare Clinical Research Network™ in 2020, which allowed sites to match patients for the trials faster and more efficiently. The platform is powered by AI algorithms and can operate across 19 disease areas. Earlier, another IQVIA‘s Linguamatics Natural Language Processing (NLP) platform won Questex’s 2019 Fierce Innovation Awards. The platform can have vast applications in healthcare and life sciences, including target identification, gene mapping, predicting patient outcomes, and so on.

A significant trend in the clinical research industry is running virtual clinical trials, a market worth $8 billion. The COVID-19 pandemic forced pharma companies to switch to remote monitoring, improved patient enrollment, apps to track patient engagement, telemedicine, decentralization, and other measures to keep trials running. Since the demand for such solutions grew significantly, CROs rushed to add virtual and decentralized capabilities to their service offerings. AI technology proved invaluable in creating and running such projects to help synthesize data and speed up clinical trial processes.

Technology giants go after drug discovery and biotech

The earlier mentioned successes of the Alphabet’s DeepMind and Meta in solving basic biology research riddles, like predicting protein structures at scale using deep learning and language models, are just the tip of the iceberg: almost every leading technological giant is now in the life sciences business, somehow.

Alphabet (a parent company of Google) has dozens of investments in life science projects, including AI-based reagent search engine BenchSci, China-based AI and quantum physics in drug discovery company XtalPi, personal genomics company 23andMe, and AI-driven drug development unicorn OWKIN to name a few. In 2021, Alphabet, together with DeepMind, launched Isomorphic Labs to focus on applying artificial intelligence to crack basic biology and drug discovery.

Apart from multiple other projects and activities in pharmaceutical research and biotech, Alphabet has a full-scale entity, Verily, dedicated to Life Sciences and MedTech.

Microsoft, a global software developer, has a deep footprint in Life Sciences, with dozens of research collaborations with big pharma, providing its infrastructures to handle big data using large-scale machine learning models. Among the latest Microsoft initiatives is MoLeR model, a new tool being developed by the company's generative chemistry team in collaboration with Novartis. The MoLeR model – unlike other generative tools – uses deep learning to come up with new structures based on a given scaffold that acts as an initial base for the generative process. Another example is AI4Science, a new Microsoft venture combining computational chemistry, quantum physics, machine learning, molecular biology, fluid dynamics, and software engineering to realize a vision of the so-called "fifth paradigm" of science.

A particularly active company in this context is a hardware producer for the gaming industry and personal computers, NVIDIA. This tech company has launched Clara Discovery, which is a collection of frameworks, applications, and AI models enabling GPU-accelerated drug discovery, with support for research in genomics, proteomics, microscopy, virtual screening, computational chemistry, visualization, clinical imaging, and natural language processing (NLP). And in March 2022, the company introduced Clara Holoscan MGX™, a platform for the medical device industry to develop and deploy real-time AI applications at the edge, specifically designed to meet required regulatory standards. Clara Holoscan aims for an all-in-one, medical-grade reference architecture, as well as long-term software support, to accelerate innovation in the medical device industry.

The future of AI in drug discovery: all things "quantum"

Most software tools used for drug discovery and biology research rely on molecular mechanics -- a simplified representation of molecules, essentially reducing them to “balls and sticks”: atoms and bonds between them. This way, it is easier to compute, but accuracy suffers greatly. To gain adequate accuracy, one has to account for the electronic behavior of atoms and molecules, i.e., consider subatomic particles -- electrons and protons. This is what quantum mechanical (QM) methods are all about -- and the theory is not new, dating back to the early decades of the 20th century.

However, quantum methods are exceptionally computationally costly -- and until recent decades, it was a prohibitive barrier for quantum theory to influence the practical side of things. Due to the exponential growth of available computing power, quantum methods are finally becoming valuable tools in scientists’ hands.

Several companies are merging machine learning and quantum theory to improve the modeling capabilities of their drug discovery systems substantially. For example, scientists at XtalPi, a China and US-based tech company backed by Sequoia China, Tencent, and Google, have built their Intelligent Digital Drug Discovery and Development (ID4) platform, incorporating quantum mechanics, artificial intelligence, and high-performance cloud computing algorithms. ID4 allows predicting with high precision the physiochemical and pharmaceutical properties of small-molecule drug candidates, as well as their crystal structures -- critical elements in drug R&D.

Another company moving this field forward is Paris-based Aqemia. The company focuses on de novo, structure-based design of lead-like molecules by combining quantum and artificial intelligence (AI). A unique quantum-inspired statistical mechanics algorithm that predicts the affinity between a compound and a therapeutic target accurately and 10,000 times faster than the competition. Aqemia’s AI can generate compounds with increasing accuracy by getting feedback from the affinity predictor.

Finally, there is Barcelona-based Pharmacelera, a computational company applying quantum theory to boost drug design via their two primary software packages: PharmScreen and PharmQSAR. The first tool allows for an accurate ligand-based virtual screening using a high-precision 3D ligand-alignment algorithm based on the interaction fields. It can generate a higher diversity rate among leads than classical methods and tools. The second one -- PharmQSAR, is a 3D quantitative structure-activity relationship (QSAR) tool that enables a combination of multiple fields of interaction to perform CoMFA/CoMSIA studies.

Another -- a more futuristic -- technological trend, exploiting quantum theory, deals with creating a quantum computer. With several decades of advances in quantum theory and simultaneous progress in several software and hardware fields, we are finally entering the era of quantum computers becoming practically viable.

While we are in the early days of quantum computing, several companies are already integrating elements of quantum computing into computational drug discovery.

For instance, POLARISqb is a UK-based developer of the world's first drug discovery software built for quantum computers, combining artificial intelligence and a quantum approach. At the heart of POLARISqb technology is the Tachyon drug design platform, used for executing distributed molecular design work across the cloud, managed by an automated process that allows searching large chemical libraries while running multiple projects in parallel. By developing proprietary software for quantum systems, the company claims it can substantially accelerate drug design and get leads of higher quality. Due to the inherent “agnosticism” of the Tachyon system, it can work in multiple diseases and indications.

Menten AI is a Canadian start-up founded in 2018 that develops a software platform for protein design powered by machine learning and quantum computing. The company uses proprietary quantum optimization algorithms, which it believes can significantly improve the accuracy of drug discovery while reducing cost and development time.

To summarize this post, let’s refer to a prediction by Dr. Christopher Savoie, Co-founder, and CEO at Zapata Computing, an American quantum software company, on cutting-edge research in this area, that he expressed in an interview for BiopharmaTrend:

“Quantum will be a part of every, or almost every, data science and machine learning workflow in biopharma in the future. I do believe it will be an integral part of it. If you can get a more accurate model by using quantum tech -- why wouldn't you do that, after all?” -Christopher Savoie

February 6, 2023

Andrii Buvailo, Ph.D.

Co-Founder, Director, BiopharmaTrend

Andrii Buvailo is a pharmaceutical industry analyst and writer, focusing on

emerging companies (startups), technologies and trends in drug discovery, and

R&D outsourcing. He received a master’s degree in Inorganic Chemistry and

a PhD in Physical Chemistry from Kyiv National Taras Shevchenko University.

His articles were published on Forbes.com, and market research reports

were referenced by some of the leading life science organizations. He also

participated in numerous scientific projects in Ukraine, Belgium, Germany, and

the United States (DAAD, Horizon 2020, NATO, CRDF grants), and published in

high-impact research journals.