GenAI

Is AI AGI?

Shoeblack.AI 2024. 11. 14. 23:48

It's different. It will be different. I don't really know the answer because AGI doesn't exist yet. AGI is an acronym for Artificial General Intelligence. I think it's a general term for really human like-artificial intelligence. AGI is a bit of a vague concept because AGI hasn't been developed yet. Since ChatGPT came out, I've come across more discussions about AGI. What is the difference between AGI and AI?

 

Let's first think about AI. In Korea, many experts say that they have started learning AI due to AlphaGo. AlphaGo beat Lee Sedol, and it was shocking moment. I still remember the day the match was broadcasted. In fact, I thought that Lee Sedol beat AI without much effort. However, the result was the opposite. I felt that something was wrong in the first match. Very shocking. AlphaGo's performance was phenomenal. But the thing is AlphaGo can only play the game of Go. It is useless for other task because AlphaGo was never trained to do any other task. I don't know if Lee Sedol is good at painting, but he can draw some image anyway whether it is good or bad. For AlphaGo? It can't draw at all, because it never learned how to do that. In other words, it's an AI that specializes in Go, and doesn't know doing anything else. So far, AI has been trained to be good at one specific task. Another example is Google Translate. Actually, I use DeepL more these days. In any case, both are AI-based translation service They both translate well. But the only task they can do is translation. They don't do anything else.

Artificial intelligence has been in the spotlight since deep learning algorithms won image competitions with overwhelming performance. The MNIST dataset is a popular data set for studying deep learning models. It is an image of handwritten numbers from 0 to 9. AI model trained with MNIST dataset can classify the handwritten numbers from 0 to 9. But what happens when you feed a model trained on MNIST data an image of the letter “A”? Does it say “A”? No, it doesn't. It finds a number between 0 and 9. This is because when the model is trained, it is set up to only answer numbers between 0 and 9. So traditional AI can't go beyond the answers given in the dataset.

 

AI in the past

 

Traditionally, the approach to training AI models has been like this: Prepare a dataset for specific task and train the model good at doing one single task. Thus, the model trained to translate can do one thing, translation. If the model was trained to analyze sentiment of the sentence, model can answer if the given text is positive or negative. Each model was trained to be good at only one task.

 

But ChatGPT can actually do all of these tasks. If you ask it to translate a text, the model will translate it; if you ask it to write something, it will write a certain type of text; if you ask it to summarize the given text, it will do it; if you ask it to tell you whether a given sentence is positive or negative, it will tell you. ChatGPT has become a model that can perform multiple tasks. This is a big difference from existing AI models so far, and it doesn't just give a fixed answer. As the input and output of the model became a 'language', users can ask the model to do the various types of tasks. One model can do a variety of tasks, not just a specific task! And it's also very performant... ChatGPT's excellent performance and flexibility in the answer was a great achievement, and with this expectation comes the hope that AGI will soon appear.

 

As deep learning technology advances, there are many attempts to train AI models to do diverse tasks. In the previous paragraph, I introduced MNIST data. It is a model that classifies handwriting. The model only provides a value between 0 and 9, but can image models be trained to do more? Hugging Face has a model that can answer better. In 'deeplearning.ai', a course titled 'open source models with hugging face' introduces one model giving flexible answer on image. The model, called 'clip-vit-large-patch14,' selects the text that best matches a given image. Whereas traditional image classification models only select answers from a set of answers defined in the data, this model compares images and sentences and selects the sentence that best matches the image. This allows the model to adopt a wider range of answers. While this is a significant improvement, some might argue that the model still seems underpowered.

 

Let's say you're using ChatGPT on the OpenAI website. ChatGPT is a good writer, but it can't draw. Some may argue that I'm wrong because the model gives you an image when you ask it to generate image. The truth is that the image s from chatGPT was actually generated by a model named DALL-E. When you ask ChatGPT to make some image, it brings in DALL-E and put it into work to create an image. When I make a request to ChatGPT for image generation, ChatGPT passes the request to DALL-E, and then shows me the image created by DALL-E. So even though ChatGPT is a text-only model, ChatGPT works with other AI models to provide various outputs. In this way, AI models can use other AI models as tools or even talk to each other to accomplish tasks. Instead of a model working alone, multiple AIs work as a team. This allows AI to do a wider range of tasks. It's a big step forward, but individual AI models may still seem limited.

 

If AGI emerges, it is likely that a single model will be able to perform an unlimited number of different tasks at a human-like level. A model that can input and output different types of information is called a multimodal model. If the model can receive information in various formats such as voice, text, image, video, etc, then it takes multimodal inputs. A multimodal output can be also feasible if the model provides answer in various formats such as text, image, voice, and video. These models can greatly expand the range of tasks. Tesla's Optimus is an AI model that operates in a physical environment. Advancement of large multimodal models (LMM) and robotics are all connected together, which eventually leads to bring AGI in to this world.

 

I'm not sure what AGI really is, but I think the flexibility of ChatGPT's answer has sparked the discussion of AGI. I think the biggest difference between traditional AI models and AGI is range of tasks the model can do. As I've been working mostly on training models to be good at specific tasks, augmented capability of the model draws a line between AI and AGI.

'GenAI' 카테고리의 다른 글

MUST USE AI services for startups and investors (TOP 5)  (1) 2024.11.26
GenAI for LOGO design!  (4) 2024.11.10
Legal advice from Copilot?  (0) 2024.11.09
Free & High quality learning materials on GenAI  (4) 2024.11.08
ChatGPT and daily routine  (2) 2024.11.07