Google Launches Gemini 2.0: A New AI Model for the Agentic Era

Reported by Kripa B December 12th, 2024 at 01:36 pm

The latest AI model introduces multimodal reasoning, advanced research tools, and real-time interaction capabilities for gaming, robotics, and beyond.

Highlights

Gemini 2.0 introduces multimodal reasoning across text, images, audio, and video.
New Gemini 2.0 Flash model is twice as fast as its predecessor, offering native image and multilingual TTS outputs.
Deep Research provides advanced capabilities for exploring and summarizing complex topics.

Google has announced the launch of Gemini 2.0, the latest iteration of its artificial intelligence (AI) model. Designed for what Google calls the "agentic era," Gemini 2.0 introduces advanced multimodal capabilities, enabling it to interact, reason, and take proactive actions across a range of tasks. Building on its predecessors, Gemini 1.0 (introduced last December) and Gemini 1.5, the new model further advances multimodality and long-context understanding to process information across text, video, images, audio, and code.

Also Read: Google, Intersect Power and TPG Rise Climate Partner to Power AI Data Centers with Clean Energy

"Information is at the core of human progress. It's why we've focused for more than 26 years on our mission to organise the world's information and make it accessible and useful. And it's why we continue to push the frontiers of AI to organise that information across every input and make it accessible via any output, so that it can be truly useful for you," said Sundar Pichai, CEO of Google and Alphabet.

Available to Developers and Testers

Gemini 2.0 Flash is now available as an experimental model to developers through the Gemini API in Google AI Studio and Vertex AI. Google aims to quickly integrate it into products like Gemini and Search. Starting December 11, Gemini 2.0 Flash will be accessible to all Gemini users.

Introducing Deep Research

Google also unveiled Deep Research, a feature leveraging advanced reasoning and long-context capabilities to act as a research assistant. It explores complex topics and compiles reports on behalf of users. This feature is available within Gemini Advanced.

Also Read: Google and Vodafone Expand Partnership to Bring AI-Powered Services Across Europe and Africa

Enhancements in AI Search

AI overviews now reach over 1 billion users globally. Google plans to incorporate Gemini 2.0's advanced reasoning capabilities into these overviews to address complex topics, multi-step questions, advanced math equations, multimodal queries, and coding challenges. Testing has begun, with broader rollout expected early next year. By 2025, AI overviews will expand to more countries and languages.

"2.0's advances are underpinned by decade-long investments in our differentiated full-stack approach to AI innovation. It's built on custom hardware like Trillium, our sixth-generation TPUs. TPUs powered 100 percent of Gemini 2.0 training and inference," Pichai noted. "If Gemini 1.0 was about organising and understanding information, Gemini 2.0 is about making it much more useful."

Also Read: Google Announces AI Collaborations for Healthcare, Sustainability, and Agriculture in India

Gemini 2.0 Flash: A Workhorse Model

"We are releasing the first model in the Gemini 2.0 family of models: an experimental version of Gemini 2.0 Flash. It's our workhorse model with low latency and enhanced performance at the cutting edge of our technology, at scale," said Demis Hassabis, CEO of Google DeepMind and Koray Kavukcuoglu, CTO of Google DeepMind on behalf of the Gemini team.

Gemini 2.0 Flash

The first model in the Gemini 2.0 family, Gemini 2.0 Flash, is optimised for low latency and enhanced performance at scale. According to Google, it outperforms Gemini 1.5 Pro on key benchmarks while operating at twice the speed. Notably, it supports multimodal outputs such as natively generated images combined with text and steerable multilingual text-to-speech (TTS) audio.

Google is also releasing a new Multimodal Live API, enabling real-time audio and video streaming inputs with the use of multiple combined tools.

Also Read: Google Features Startups Using AI to Transform Mental Health Support

Agentic Experiences

According to Google, Gemini 2.0 Flash's native user interface action-capabilities, along with other improvements like multimodal reasoning, long context understanding, complex instruction following and planning, compositional function-calling, native tool use and improved latency, all work in concert to enable a new class of agentic experiences.

Google said it is exploring prototypes built on Gemini 2.0, including:

Project Astra: A personal AI assistant with enhanced memory, multilingual dialogue, and integration with Google tools like Search, Lens, and Maps. Astra now retains up to 10 minutes of in-session memory and can recall past conversations. Google plans to extend these capabilities to Gemini and AR glasses.

Project Mariner: A browser-based agent capable of completing tasks by interpreting web elements and user interactions.

"Project Mariner is an early research prototype built with Gemini 2.0 that explores the future of human-agent interaction, starting with your browser. As a research prototype, it’s able to understand and reason across information in your browser screen, including pixels and web elements like text, code, images and forms, and then uses that information via an experimental Chrome extension to complete tasks for you," Google explained.

Jules: An AI-powered coding assistant integrated with GitHub workflows to support software development. This effort is part of Google's long-term goal of building AI agents that are useful across all domains, including coding.

Gaming and Robotics

Gene 2 Launch

At this juncture, Google has also highlighted the launch of its large-scale foundation world model, Genie 2, which the company unveiled on December 4. Genie 2 is a foundation world model capable of generating an endless variety of action-controllable, playable 3D environments for training and evaluating embodied agents. Building on this advancement, Google said it has built agents using Gemini 2.0 that can help users navigate the virtual world of video games.

Also Read: Google AI Innovations: Key Announcements From October and November 2024

Robotic Applications

Beyond virtual applications, Google is experimenting with agents that apply Gemini 2.0's spatial reasoning capabilities to robotics, enabling new possibilities in the physical world.

"Today's releases mark a new chapter for our Gemini model. With the release of Gemini 2.0 Flash, and the series of research prototypes exploring agentic possibilities, we have reached an exciting milestone in the Gemini era," Google said on December 11.

The company plans to integrate Gemini 2.0 across its suite of products, starting with Search and the Gemini app, while continuing to explore its capabilities in collaboration with developers, trusted testers, and experts.

Reported By

Kripa B

Kirpa B is passionate about the latest advancements in Artificial Intelligence technologies and has a keen interest in telecom. In her free time, she enjoys gardening or diving into insightful articles on AI.