Google launches Gemini Embedding 2 to unify text, images, audio and video understanding The Mainstream

A new development in artificial intelligence is set to simplify how machines process different types of data. Google has introduced Gemini Embedding 2, its first fully multimodal embedding model designed to map text, images, audio and video into a single unified system.

The company shared details of the model in a blog post, highlighting that Gemini Embedding 2 is the successor to its earlier text-only embedding model released last year. The new system is capable of understanding semantic meaning across more than 100 languages. It is currently available in public preview through the Gemini API and Vertex AI.

Typically, artificial intelligence systems store different types of information such as text, images, audio and videos in separate processing structures. When a user requests information, the model searches within the relevant format. As a result, a concept like a “cat” mentioned in text and a “cat” shown in a video may be treated as two unrelated items.

Gemini Embedding 2 addresses this limitation by creating a single architecture that processes all forms of content within one shared embedding space. This approach allows the system to analyse mixed inputs, such as documents that contain both images and text, in a more natural way similar to how humans interpret information.

According to Google, the new model simplifies “complex pipelines and enhances a wide variety of multimodal downstream tasks.” These include Retrieval-Augmented Generation (RAG), semantic search, sentiment analysis and data clustering.

The model also offers several technical capabilities. It supports a text context window of up to 8,192 input tokens. It can process up to 6 images per request in PNG and JPEG formats, and handle video inputs of up to 120 seconds in MP4 and MOV formats. Additionally, the system can process audio data directly without requiring text transcriptions.

Gemini Embedding 2 can also embed PDF documents of up to 6 pages. Another key capability is its ability to understand interleaved inputs, meaning users can send multiple data types such as text and images in the same request. Google said this feature helps the model develop a more accurate understanding of complex real-world information.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream LinkedIn | The Mainstream Facebook | The Mainstream Youtube | The Mainstream Twitter

About us:

The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.

Top 5 This Week

Cyber attackers exploit Microsoft Teams and Quick Assist to spread new A0Backdoor malware

IndiGo CEO Pieter Elbers resigns with immediate effect; Rahul Bhatia takes interim charge

US lawmakers introduce WISA bill to ease H-1B restrictions and support skilled foreign workers

YouTube ad revenue surpasses combined earnings of major media giants

Fluence leverages Bengaluru innovation centre to build AI-driven power grid solutions

Related News

YouTube ad revenue surpasses combined earnings of major media giants

Zoom introduces real-time voice translation and deepfake detection for meetings

Adobe rolls out AI assistant for Photoshop and expands Firefly Image Editor capabilities

Amazon expands Health AI assistant to its main website and mobile app

Google, Tesla and partners launch Utilize coalition to rethink power grid usage

Microsoft’s next-gen Xbox ‘Project Helix’ could be priced above $1,000, reports suggest

Google launches Gemini Embedding 2 to unify text, images, audio and video understanding

LEAVE A REPLY Cancel reply

Popular Articles

Cyber attackers exploit Microsoft Teams and Quick Assist to spread new A0Backdoor malware

IndiGo CEO Pieter Elbers resigns with immediate effect; Rahul Bhatia takes interim charge

US lawmakers introduce WISA bill to ease H-1B restrictions and support skilled foreign workers

YouTube ad revenue surpasses combined earnings of major media giants

Fluence leverages Bengaluru innovation centre to build AI-driven power grid solutions

Latest Articles

Fluence leverages Bengaluru innovation centre to build AI-driven power grid solutions

India may attract more GCC investment if West Asia expansion slows amid tensions

Charles Schwab leases 340,000 sq ft for new GCC in Hyderabad to expand India operations

Most Popular

Computer Security Day 2026: Why Protecting Your Devices Matters More Than Ever

The Rise of Edge AI: Where Data Meets Intelligence

How Artificial Intelligence is Revolutionizing Everyday Life

Subscribe

Subscribe to newsletter

Subscribe to newsletter

Top 5 This Week

Related News

Google launches Gemini Embedding 2 to unify text, images, audio and video understanding

LEAVE A REPLY Cancel reply

Popular Articles

Latest Articles

Most Popular

Subscribe