AI & Chemistry

How accurate is AI?

Shoeblack.AI 2024. 11. 19. 11:58

Many people are disappointed by AI's answers. When ChatGPT was first released, the answers of the model were mind-blowing. After sometime, people started getting skeptic because of ridiculous answers. Experts are releuctant to use chatGPT because factuality matters. The QSAR model also uses AI technology, but there is a lot of mistrust. According to the developer of the model, it is 80% accurate. But when the user tries to predict two chemicals of interest, the model was wrong for both compounds. So even though the developer claims 80% accuracy, the model is 0% accurate from the user's perspective. In such case, the user wouldn't even consider using it again since the user experience was bad. No matter how convenient and fast, no one wants to waste a second of their time getting a useless result. The QSAR model has been around since the 1950s. They are most commonly used in drug discovery. People who think that QSAR models will magically find good candidates have been highly disappointed. In many cases, prediction values didn't match the experimental results.

 

There is also a big difference when people use the word many. People who developed models talk about more than 1,000 chemicals. In fact, even 1,000 is not a lot. Personally, I think more than 100,000 may be considered as a large volume (billions of data is really huge dataset), but for those who usually produce experimental data, even 10 substances are too many data. 100 is overwhelming. I remember one researcher claims that they have big data, which had 600 drugs. 600 is definitely far from big data. If animal experiments were conducted, it's hard to produce data even for 10 compounds per year. If it's cell experiments, it's hard to produce data for 100 compounds a year. For example, the person who developed the model on 10,000 data points and got 80% accuracy. The model got 8,000 right and 2,000 wrong. People usually feed less than 10 substances into the model. But if all 10 of them are similar to the 2,000 that are unfortunately wrong, it's likely that all 10 will be wrong. This is why the perceived accuracy of the model is so different for the model developers and the people using the model.

 

Are QSAR models a complete failure? Not quite. Large pharmaceutical companies continue to invest research for developing QSAR models. They're working to develop models based on deep learning techniques and to validate the model internally. Currently, QSAR models are used to find substances that are likely to fail. It is difficult to find substances that are likely to become new drugs with QSAR models. However, the model can be used to eliminate compounds that are likely to fail in future clinical trials. The reason why drug development is so costly is that many drugs fail in clinical trials, which is the end of the development process, resulting in a lot of wasted time and money. Therefore, it is important to find substances that will fail early. The way AI can save time and money in drug development is to find out which substances will fail in advance and reduce unnecessary R&D costs.

 

There are two main reasons for distrust in QSAR models: One is a user's problem, and the other is a fundamental limitation of the model itself. First problems happens when predictions are made without considering the applicability domain of the model. Someone once tried to predict the toxicity of perfluorinated compounds. The user made a wrong prediction. If you look at the structure of perfluorinated compounds, there are a lot of fluorine (F) attached to the carbon (C). The data used to develop the model is mostly based on a structure with hydrogen (H) attached to carbon (C) as the main backbone. Here, the model was asked to predict a totally different structure. It doesn't really make sense to throw in a structure that the model can't predict in the first place and then claim that the prediction is wrong and the model is inaccurate. The model provides wrong answer because input was bad. Of course, it's a program, it can output any number, but that number doesn't have any meaning. This is a model's applicability issue. For more information, check the previous post on analyzing model applicability doman analysis.

 

The second reason is the prediction error of the model. No model is 100% correct. Even the data doesn't match 100%. What does this mean? There is an inherent experimental error that occurs when experiments were done with chemicals. So assays were usually repeated three times or more under the same conditions, and the average value of the measurements is used. The experimental data is not always the same. Models that claim to be more accurate than the experimental error achieved meaningless improvement. This is often the case with results presented at recent top-notch deep learning conferences. They achieve high accuracy, but the improvement is meaningless. This is because experimental data is not that accurate in the real world. Even with robotized, automated experiments, there will always be slight variations in the data. Since the model is trained on the data, it is impossible to overcome the experimental errors in the data itself.
If you look at the overall distribution of data used to train a model, you will see that there are not just effective substances, but less effective substances. The distribution of the data is dominated by a large number of substances that are much less effective than those that are effective. AI learns the distribution of the data, and majrity of the dataset is not successful comopunds. How can a model trained on such data find successful candidates? It's not easy. Almost impossible. It's hard to find great drug candidates with AI. But it's easy to find not promising ones, which is why they're best utilized by filtering out structures that will likely fail in the later stage.

However, those who predicted a small number of substances with AI usually want to know how good their compounds are. They want to know if their substances are safe without toxic issues, and they are greatly disappointed when the AI's results don't match their expectations. And they're even more disappointed when the results don't match with experimental results. Disappointed customers never come back. So mistrust is widespread.

 

To improve the accuracy of a model's answers, validation is needed, and it requires significant work load. It's not easy to do validation research when there is a shortage of AI developers. Therefore, it would be greatly helpful to improve AI if more people use it and share various use cases. Please use them :D