Google upgrades Gemini Flash 3 with agentic vision for deeper image analysis The Mainstream

In a shift toward more active and reliable visual AI, Google on 28 January introduced a new capability called “Agentic Vision” for Gemini Flash 3. The update changes how the model handles images, moving from simple visual recognition to step-by-step investigation.

According to a Google blog post, Agentic Vision blends visual reasoning with automated code execution to study images using a “Think, Act, Observe” loop. Google said this method helps reduce hallucinations and improves accuracy in visual tasks. “The model formulates plans to zoom in, inspect and manipulate images step-by-step, grounding answers in visual evidence,” the company said.

With Agentic Vision, the model can annotate images in real time. Instead of only describing what it sees, Gemini Flash 3 acts as an agent that runs Python code to visualise and verify its findings. This approach replaces what Google described as “probabilistic guessing” with code-based execution, delivering a reported 5–10% quality improvement.

Google explained that visual tasks involving multiple steps often cause errors in standard large language models. “Standard LLMs often hallucinate during multi-step visual arithmetic. Gemini 3 Flash bypasses this by offloading computation to a deterministic Python environment,” the company said. With this update, Google says its models are shifting from systems that only “look” at images to agents that actively “investigate” them.

The company also shared real-world use cases. One example highlighted was “PlanCheckSolver.com, an AI-powered building plan validation platform,” which improved accuracy by 5% by using Gemini 3 Flash with code execution to repeatedly inspect high-resolution visuals.

In another demonstration, “the model is asked to count the digits on a hand in the Gemini app. To avoid counting errors, it uses Python to draw bounding boxes and numeric labels over each finger it identifies.”

Agentic Vision is currently available to developers through the Gemini API in Google AI Studio and via Vertex AI within the Gemini app.

Google also outlined future plans for the feature. The company said it aims to let the model automatically decide when to rotate or zoom images, or perform visual calculations, without extra prompts. It also plans to add tools such as web search and reverse image search, and to expand Agentic Vision to larger and more powerful Gemini models beyond Flash.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter

About us:

The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.

Top 5 This Week

Agilisium invests ₹50 crore to reskill workforce for AI roles in life sciences

Bentley to cut up to 275 jobs despite strong profits and EV investments

US Navy signs $71 million deal with Gecko Robotics to speed up ship maintenance

Infosys and Citizens Financial Group to set up AI hub in Bengaluru for smarter banking

Niv-AI secures $12 million to build AI-driven power management for data centers

Related News

Gamma launches AI tool ‘Imagine’ to simplify marketing design for professionals

Major tech firms unite to combat online scams through global intelligence-sharing pledge

OpenAI explores $10 billion enterprise AI venture with private equity firms

CollarEV launches ‘Moon’ electric two-wheeler for B2B delivery and logistics segment

WhatsApp plans parent-controlled accounts to enhance safety for users under 13

Nvidia partners with BYD and Geely to expand global robotaxi ambitions

Google upgrades Gemini Flash 3 with agentic vision for deeper image analysis

LEAVE A REPLY Cancel reply

Popular Articles

Agilisium invests ₹50 crore to reskill workforce for AI roles in life sciences

Bentley to cut up to 275 jobs despite strong profits and EV investments

US Navy signs $71 million deal with Gecko Robotics to speed up ship maintenance

Infosys and Citizens Financial Group to set up AI hub in Bengaluru for smarter banking

Niv-AI secures $12 million to build AI-driven power management for data centers

Latest Articles

TransUnion builds OneTru platform with strong GCC India leadership

US Bancorp plans major Chennai office lease to launch India GCC

Punjab unveils major incentives to position Mohali as a Tier-1 IT hub

Most Popular

Computer Security Day 2026: Why Protecting Your Devices Matters More Than Ever

The Rise of Edge AI: Where Data Meets Intelligence

How Artificial Intelligence is Revolutionizing Everyday Life

Subscribe

Subscribe to newsletter