AI & Chemistry

Toxicity prediction services making money without deeplearning

Shoeblack.AI 2024. 11. 27. 15:44

This is an introduction on toxicity prediction programs that make profit. What they all have in common is that they don't use deep learning technology at all. When it comes to toxicity prediction services, it doesn't really matter if they use artificial intelligence or deep learning. What matters is how accurate the prediction is. The technology under the hood is actually secondary. The software that is selling well and performing well in the market today does not have deep learning technology.

1. OECD QSAR toolbox (Laboratory of Mathematical Chemistry, LMC)

This program is free to download. How can they make money? The program is developed by the Laboratory of Mathematical Chemistry (LMC) in Bulgaria. The program is developed under the contract with the OECD and ECHA. The contract is ongoing and the software is constantly updated.

https://qsartoolbox.org/

 

QSAR Toolbox

Build data matrices and prediction reports Once you have done your assessment with the Toolbox, it is time to share the results with your colleagues, customers or regulators. After finding analogues, the Toolbox offers a data matrix wizard that builds data

qsartoolbox.org

 

LMC also developed other programs that are used by large pharmaceutical and global cosmetic companies. They pay significant licensing fees every year. Why has LMC's software become such a profitable software? Because it focuses on two pain points. The first is data. Whether you use deep learning or machine learning, there's not much difference in accuracy if the collected data is similar. The quality of the data is far more important than the complexity of the algorithm of the model. If the data isn't better, the model won't perform as well. This is why good data collection is the most important factor in improving predictive accuracy. Data collection and cleaning is the most difficult, time-consuming, and labor-intensive task. Collecting, cleaning, and analyzing this data is something that most deep learning researchers don't want to do. This dirty and laborious work is the key to getting good quality predictive performance. At LMC, a lot of experts put a lot of effort into collecting and cleaning data. That's why their software is so good. The second pain point is the explanation of the predicted value. Even with deep learning, there is no explanation for the predicted values. The software developed by LMC provides a rationale along with the predicted value. It provides prediction results based on data obtained from papers and logical reasons. What if the prediction does not match the experimental results? Is the prediction wrong? The prediction is wrong only if the data supporting the prediction can be refuted. Otherwise, the experiment is wrong. The second reason is a strong selling point that you won't find in other software.

https://oasis-lmc.org/

 

Laboratory of Mathematical Chemistry

Team LMC team members have expertise in chemisty, biochemistry, physics, mathematics and software development. LMC takes pride in its international cooperation, interdisciplinary expertise and diversity of scientific products - mathematical models and soft

oasis-lmc.org

What are the prices of the programs offered by LMC? I don't know. Try contacting them directly. There's usually no set price for this kind of software. They only offer it to for-profit companies. They don't give it away for free because it's academic research. You have to pay a high price, and it's only available to those who can afford it. I think it's the most successful prediction service I've seen to date. Does the OECD QSAR Toolbox use deep learning? No. The core technology of LMC is to efficiently search the database. What if the information is not in the database? You can predict based on the similar structures, customized way of making prediction to the query molecule. How do we choose similar structures? There are many options. Similar structures, similar mechanisms, similar metabolism, and so on. This is why the OECD QSAR toolbox is not really easy to use. It's a program that can only be used properly by someone with expertise. Depending on the situation, you may need to choose one with a similar structure, or you may need to choose one with a similar mechanism. It is a tool for people who have the expertise to make decisions.

 

2. Derek Nexus (Lhasa Limited)

Derek Nexus provides a rule-based model. I first came across this model when I was looking for a liver toxicity prediction model. It predicts toxicity based on structural patterns found by experts. So when toxicity is identified, the structure can be used to explain why it is toxic. A wide range of toxicity endpoints are covered in the software. There are also services for organ-specific toxicity prediction. The company that provides this service, Lhasa Limited, also offers a variety of other software. Every year, Lhasa Limited presents an updated version of their service. Last time, they showed prediction of drug synthesis pathway. This module also predicts the genotoxicity of the byproducts. While Derek Nexus is easy to use and widely utilized, there are many other useful services out there.
So, do they use deep learning models here? they don't use them here either. Rule-based models look for predetermined patterns. So there's no room for deep learning. Even in the case of predicting the genotoxicity of byproducts in the drug synthesis pathway, there is no deep learning technology. There is a possibility that it could be used, but it is not currently deployed in the software.

https://www.lhasalimited.org/solutions/

 

In Silico Software Solutions | Lhasa Limited

In silico software solutions for chemical toxicity, mutagenicity, degradation, purge prediction and information management.

www.lhasalimited.org

 

3. CASE Ultra (MultiCASE)

I recently became aware of this program. I first learned about it through an event organized by MultiCASE at the Society of Toxicology (SOT). At the QSAR 2023 meeting, I met a few people from MultiCASE and had a chance to chat with them. 

https://multicase.com/

 

Home

MultiCASE software is used across a variety of industries to aid in chemical safety evaluations such as ICH M7 for pharmaceutical impurities and carcinogenic potential of nitrosamines.

multicase.com

What both Derek Nexus and CASE Ultra emphasize is the ICH M7 guidelines. ICH is an acronym for International Council for Harmonization. The full name is International Committee for Harmonization of Medicinal Products. ICH M7 is about genotoxicity prediction for impurities in pharmaceuticals, and it states that results can be submitted as predicted values without experiments. It's important to understand these guidelines because the regulations are encouraging the use of predictive models.

https://multicase.com/in-silico-applications/ich-m7/

 

ICH M7

CASE Ultra software was developed in collaboration with the US FDA and supports statistical and expert rule-based QSAR models for ICH M7

multicase.com

Toxicity prediction is closely tied to regulation. In particular, regulatory changes must accompany new predictive technologies in order for them to be actively used. Technologies need to be developed with a focus on providing regulatory-acceptable predictions.

 

4. ADMET predictor (Simulation plus)

Main product of Simulation Plus is physiologically-based pharmacokinetics (PBPK) models. PBPK model is a model that kinetically simulates the ADME process of a drug based on differential equations. PBPK models use a variety of parameters, which are drug-specific. Therefore, if experimental data are not available, the QSAR model can be used to calculate the parameters required for PBPK model calculation.ADMET predictor seems to be developed to support PBPK model. Here, too, predictions are made based on simulation technology and machine learning models rather than deep learning technology.

5. ACD/Labs

I haven't used this program. I learned about this program while researching available software. It was one of the companies that participated and promoted at the QSAR 2023 conference. They develop various software related to analytical chemistry, and toxicity prediction seems to be one of the small parts. The company started in the 90s and is already 30 years old, so it is quite old among prediction services.

The reasons why it is difficult to use deep learning in the above programs are as follows.
1. Data size is small in many cases
2. Using deep learning does not significantly improve predictive power.

The selling points of toxicity prediction programs are summarized as follows.
1. A logical basis for prediction.
2. Novel enpdoint that cannot be predicted elsewhere. (novelty in dataset)
3. Recommended for use by regulations. (ICH M7)

There are a lot of deep learning startups these days, but it doesn't really matter if they use deep learning. It is not the complexity of the technology that matters, but the accuracy of the predictions and the explanation of the plausibility of the predictions. The most important part of this is data collection. This means a core competency of the service is coming from the dataset. There's no point in just collecting a bunch of publicly available data. Unique service can be developed only by unique dataset that can't be easily collected and curated. But data collection is long and time-consuming. Entry barrier to the market is made because of the time and cost required to collect and curate the dataset.