The noise about ChatGPT and Generative AI technology (“GenAI”)has raised important questions for the biopharma industry:
- How do we sensibly invest more in AI capabilities for drug development?
- What type of AI investments do we need to make to stay ahead of the competition?
- What value can existing AI tools deliver across our drug development pipeline?
And like many others right now, these questions have been at the top of my mind because new AI advances could transform how quickly we can get new, safe life-improving or life-extending therapies to patients faster.
I have recently begun a new position as the Head of Marketing at Biorelate. I chose to join Biorelate after witnessing the potential of their ground-breaking platform, Galactic AI™, to amplify the impressive benefits that the latest generative AI wave has promised. Most significantly, one of the primary concerns when utilising generative AI technology is the risk of generating"hallucinations" rather than results grounded in truth.
At Biorelate, they are harnessing the power of large language models (LLMs), which is the underlying technology behind ChatGPT, to curate highly valuable data from biomedical text at human levels of accuracy. However, rather than using these technologies to generate new text, Biorelate are using them to better understand the existing scientific literature in the world today by structuring difficult to capture data like cause-and-effect.
So, while I’m getting up to speed on how Biorelate are contributing to the field,I’ve spent some time looking into the main challenges within the pharma world with using GenAI.
Below is a summary of the most common considerations that I have seen in my research and my conversations with biopharma leaders, since starting at Biorelate:
1. Data reliability: Results produced by GenAI must be clean and high quality in order to be usable
GenAI relies on using vast amounts of data, often trained on generic sources. If we are to trust the results, the models must be fine-tuned and tested to prove that they are a true representation of the challenge to be solved. This can mean training new LLMs from scratch on more specific source material (e.g. BioGPT) or fine-tuning models to perform better on more specific tasks.
2. Latest data: Will GenAI help us stay ahead of the curve and stay current with the latest research?
Many AI tools have only been trained on sources up until a certain point in time, e.g., both GPT3.5 and GPT4 are only using data up until2021. This means that they have not been trained on the latest data such as research findings published in the scientific literature. For fast moving fields like COVID-19, GenAI tools often face the same issues that traditional databases face - they very quickly become out of date. If you couple this with the issues of data reliability and traceability, it’s hard to be confident that the results ascertained are correct or not.
3. Data protection: Uploading lab notes, clinical data and more internal data sources into tools like ChatGPT poses huge risks for exposing confidential data
In order to take full advantage of GenAI, some of the best use cases involve using these tools to either interpret existing data (such as clinical studies) or fine-tuning them on in-house data to achieve better performance. At present, a lot of these tools can only be interfaced outside of an organisation’s firewall and this poses huge risks for potentially exposing confidential data.
4. IP: Will integrating ChatGPT or other GenAI solutions create IP challenges?
Protection of intellectual property is essential to allow biopharma to recover the cost of R&D. If generative AI is used to create a drug candidate, it may be unclear who owns the resulting IP. This raises challenges for patent ownership, particularly in the short-term where the precedents are still unclear.
5. Traceability: GenAI and large language models are a bit of a black box. Without full transparency about where the answers come from, its hard to have complete confidence in their accuracy.
Whilst GenAI or large language models might be capable of designing novel drug candidates or suggesting solutions to a problem, the actual process that led to the selection of one solution over another is opaque with these models. The lack of transparency in the “decision-making” process does not align with our expectations as scientists where we expect to make decisions based on rational argument and an understanding of the data sources and methods.
6. Regulatory acceptance: Regulatory pathways ensure that new medicines are safe and effective when they reach the market.
Whilst government agencies provide regulations and guidance on the drug approval process, in a fast-moving field such as GenAI, there will be uncertainties in the drug approval pathway. A lack of clear guidance create sun certainty and slows down decision making.
7. You need good ROI to justify the high costs
The cost of obtaining and maintaining high quality data can be prohibitive.Add on top of that, the cost of building a federated platform to house and process the data in a secure environment, and very quickly the costs begin to escalate. Investments in these technologies are best justified if the value in using them is clear.
While these 7 considerations are all valid today, they should not outweigh the potential benefits of GenAI and LLMs in drug discovery. It’s also very likely that given the pace of developments that we’re seeing that many of these challenges will be solved sooner rather than later.
That’s part of the reason why I’m so excited to be working at Biorelate. The way that they are using LLMs takes all of these considerations into account and addresses many of the concerns raised. For example, all of their results are traceable to source, their models are fine-tuned on carefully curated datasets and customers access the resulting data in a way that doesn’t put their IP at risk.