Google is taking its generative artificial intelligence to the next level by giving it the ability to “see”. This is the key takeaway from the tech giant’s latest announcements. On December 11, Google introduced the second version of its multimodal AI model, “Gemini 2.0”, capable of processing text, images, and audio files.
For now, Google has only unveiled the smallest model in the Gemini 2.0 family, named “Gemini 2.0 Flash”, which is already said to be twice as fast and more efficient than its predecessor, Gemini 1.5 Pro. This version is currently available to developers and will be rolled out to the general public starting January.
A shift toward autonomous AI agents
Gemini 2.0 represents a significant step for Google as it marks its entry into the era of AI agents. This emerging trend goes beyond traditional chat-based AI tools like ChatGPT. Instead of merely responding to questions, these advanced models can now perform tasks, plan actions, and even operate autonomously by interacting with user interfaces.
This evolution signals a shift toward AI that doesn’t just assist but takes on a more proactive role, offering solutions and performing tasks in a seamless and intelligent manner. As Google continues to advance its AI capabilities, the possibilities for real-time interaction and automation are expanding rapidly.
