AlphaFold, winner of the Nobel Prize in Chemistry. I'm often asked about it because I'm in the field. But I don't really know much about it because I rarely use it. But I do understand why it's an important research. I just don't know the details of the model. AlphaFold predicts the structure of a protein based on the protein sequence, and it creates a three-dimensional protein structure. Proteins are made up of amino acids. The order of these amino acids is called the protein sequence. Amino aicds are building-blocks of protein. Wikipedia has a good explanation of amino acids, see the link below.
https://en.wikipedia.org/wiki/Amino_acid
Cells in our body are constantly making proteins. Digestive enzymes are also proteins. They're made in our stomachs every day. Hormones are also proteins. Many of our body's functions are maintained by proteins. How are proteins made? Protein information is contained in DNA. Proteins are the result of interpreting the information in DNA. DNA is contained in the nucleus of the cell. When DNA is damaged, it'll be a huge problem. It contains information that is crucial to maintaining out lives. That's why DNA is kept safe in the nucleus. DNA contains huge information; therfore, only small part of it is copied and used to make a protein. It's like going to a library and taking copy of the book that you need. DNA information is copied into mRNA (messenger RNA); thus no need to take out the DNA from the nucleus. Outside the nucleus, the mRNA meets the ribosome. The ribosome is responsible for translating mRNA into protein. This is because the protein sequence information is contained in the mRNA. So, the information in DNA is translated into mRNA and then into proteins. This process occurs in every cell. It is called the Central Dogma because it is at the core of all life.
https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology
This core principle means that if we have all the DNA information, we know all the protein sequence. The protein sequences were identified through the human genome project. The project ended in the early 2000s, so the DNA information is already disclosed, which means the protein information is already discovered. Then why is the alphafold so important?
https://en.wikipedia.org/wiki/Human_Genome_Project
Protein 3D structure can't be derived from the protein sequence alone. If the components of protein, amino acids sequence, were known, why can't we get the 3D structure? The key words here are “three-dimensional” and “fold” in alphafold. Protein sequences fold into different shapes, specific structures in three-dimensional space. Inside the cell, ribosomes translate mRNA into protein. Then there's the process of folding the protein inside the cell. How the protein is folded determines its three-dimensional structure. And the three-dimensional structure determines the function of the protein. Therefore, it has been difficult to understand the function of proteins because the three-dimensional structure could not be known from the sequence alone. However, through the development of AlphaFold, it is now possible to predict the three-dimensional structure from the sequence.
https://en.wikipedia.org/wiki/Protein_structure
There is a specific structural pattern in protein 3D structure. A structure that is tightly coiled is called an alpha helix, in which certain amino acids curl around each other in a way that pulls them together. A structure that has a broad flat shape is called a beta sheet, where the amino acid sequence is folded and stacked one after another. The combination of these specific structure patterns result in wide range of 3D structures. If you look at these structures, you'll notice that there are amino acids that are close together regardless of their sequence order. They may be far apart in sequence, but in three-dimensional space they're right next to each other because of the way the protein is folded. So even if you found the sequence, you still needed to run a separate experiment to find the three-dimensional structure. For some proteins, this is not feasible. Some proteins are bound to the cell membrane. The moment it was removed from the cell membrane for the experiment, the structure of the protein is spoiled.
Why is it important to understand the 3D structure of proteins? It plays an important role in the discovery of new drugs. Various computational techniques were introduced to reducte the time and cost of drug discovery research, which was basically done by the experiments. It becomes a filed called computer-aided drug design (CADD). Recently, AI has been applied to many of the techniques used in CADD, giving rise to the term AI drug design (AIDD). The weapons at your disposal depend on whether protein structure is known or not. When we don't have a protein structure, QSAR is one of options. If you have a protein structure, you can study in more detail where the substance binds and regulates the function of the protein. In drug discovery, the first step is to find the protein that causes the disease and then find chemicals to modulate the function of the protein, and AlphaFold has made a great contribution to drug discovery by accurately finding the structure of various proteins, potentailly target protein to cure the disease.
Personally, I don't use AlphaFold much because I do QSAR-based research. I mainly use animal data. Sometimes a chemical that binds incredibly well to a particular protein doesn't work well when tested in animals. This is because of the absorption and metabolism in living organisms. So even if a drug works well in cell assay, it needs to be validated in animals. This is because a chemical that works well in cells may not work well in animals due to the process of drugs delivered into the site where it should work. In the case of toxicity in animal tests, it is often difficult to explain the toxicity of the substance with unintended binding to proteins. Therefore, QSAR models are developed to predict the symptoms observed in animals or humans rather than predicting the binding with proteins. So I don't need AlphaFold in this task. Since the advent of AlphaFold, there have been many studies to design novel protein structures. In medicine, there are protein medicines (biopharmaceuticals). Vaccines are a typical example. If I want to predict the toxicity of these protein drugs, I might need to use AlphaFold. However, there is not much data on biopharmaceuticals yet, so I don't touch it much. Still, I was curious, so I accessed it a few times. It's simple to use, so it would be interesting to try it out.
This is the website where you can try out Alphafold 3.
https://deepmind.google/technologies/alphafold/alphafold-server/
However, generating the protein structures is computationally intensive, so DeepMind has generated all the structures and distributed them in the form of a database, so you can browse the database first. You may generate the protein structure if you don't see the protein structure you want.
https://alphafold.ebi.ac.uk/
AlphaFold3 is the most recent version, and the code is now publicly available. If you have a server, you can set it up in your local machine.
https://github.com/google-deepmind/alphafold3
I'll have to do some more research and study the model configuration in the code and the paper below.
https://www.nature.com/articles/s41586-024-07487-w
DeepMind also released a playlist of AlphaFold videos in Youtube. I haven't watched them yet, but they're all short. It would be interesting to watch and summarize them next time.
https://youtube.com/playlist?list=PLqYmG7hTraZAhkAh72kzzLC4r2O4VoVgz&si=gRqlFgpPsJjA6-gX
'AI & Chemistry' 카테고리의 다른 글
AlphaFold explained in the simplest way (1) | 2024.11.25 |
---|---|
AI predicts toxicity of marijuana (0) | 2024.11.24 |
Malicious AI use cases? (0) | 2024.11.22 |
AI doesn't know what it's doing (0) | 2024.11.21 |
So? How accurate is AI? (0) | 2024.11.20 |