Someone who never shows their true colors. You never know what they're up to. You can't trust them. Experts distrust AI for similar reasons. It's hard to understand why a model answers the way it does and what its intentions are. AI models are sometimes referred to as black box models. This is because it is difficult to understand why the model made the inferences like that. The same is true for QSAR models. If the model predicts that a substance is toxic, it's great if there's a rationale for why it predicted as toxic, but if there's no rationale? It's hard to trust.
Experiments measures the activity or toxicity of a substance. However, experimental results can sometimes be inconsistent. In fact, it's not uncommon for a substance to be toxic in cell tests, but not in animal tests. In these cases, the reasons for the discrepancies need to be investigated. When a prediction was made through a QSAR model, but it doesn't match the experimental value. In this case, the lack of sufficient explanation for the results leads to distrust of the model.
QSAR initially uses only structural information to make predictions. It is a model that calculates the experimental value of the structure with the structure information alone. Therefore, it explains the change in experimental values based on the variation in structure. It is impossible to explain the biological meaning of the prediction value. The last principle of the OECD QSAR validation guideline is to provide an interpretation of the model. This means explaining how structural changes affect changes in experimental values. Interestingly, the fifth and final principle is not mandatory. This is because it is often not possible to do so. By the way, how can we interpret a model?
It is not 100% correct answer to say that structure is inputted to a QSAR model. It works exactly like this: Structure >> descriptor >> model >> predictions. This is the process to make a prediction. The model only works if you turn the structure into a descriptor. Descriptor? In machine learning models, they are usually called feature values. Textbooks explain that a descriptor is a mathematical interpretation of a molecular structure. In practical terms, any number that can be calculated from a molecular structure is a descriptor. For example, simple values are the number of carbons (C) and the number of oxygen (O). These are the values you can compute if you have the molecular structure information. Descriptors are values calculated by mathematical formula defined by experts based on the structural information. These descriptor values are the numbers that represent the structure. When structure was inputted to the model, it's the same thing that the descriptors were inputted to the model. If the descriptor values are similar, the structures are similar. When you enter information about the key features of a molecular structure into the model, the model calculates the experimental values of that structure from these descriptor values. Since the model predicts the experimental values from the values of the key features of the molecular structure, the model is a formula that finds the relationship between the structural information and the experimental values. Therefore, when interpreting the model, we should be able to find the correlation between the molecular structure and the experimental value.
While this is true in theory, it is often not possible in practice, because it is difficult to interpret the descriptors used in the model. Some descriptors are obviously calculated from molecular structures, but it's hard to figure out what they mean. Yet somehow, when these hard-to-interpret descriptors are used in a model, the model's prediction accuracy increases. I don't see any improvement in accuracy when using easy-to-interpret descriptors... In order to interpret a model, the model must first be able to make accurate predictions. Only then the interpretation makes sense. So, if you use an easy-to-interpret descriptors, but the prediction accuracy is not good...? There's nothing to interpret. If you use a hard-to-interpret descriptors and your model has good predictive power...? That's good, but it's still impossible to interpret the model. So it is difficult to provide interpretation of the model in any case.
To overcome these shortcomings, different methods of model interpretation are being studied. Model interpretation is still a challenge. XAI(eXplainable AI) focues on interpretation of the model to understand its inference better. If XAI methods can be improved significantly, the reliability of QSAR models can be improved as well. When QSAR is equipped with explainability, it can become a useful and powerful tool for many experts.
'AI & Chemistry' 카테고리의 다른 글
AI won Nobel prize in Chemistry? well.. I don't use it much. (0) | 2024.11.23 |
---|---|
Malicious AI use cases? (0) | 2024.11.22 |
So? How accurate is AI? (0) | 2024.11.20 |
How accurate is AI? (0) | 2024.11.19 |
Is it really safe if AI says it is safe? (0) | 2024.11.18 |