In a shift toward more active and reliable visual AI, Google on 28 January introduced a new capability called “Agentic Vision” for Gemini Flash 3. The update changes how the model handles images, moving from simple visual recognition to step-by-step investigation.
According to a Google blog post, Agentic Vision blends visual reasoning with automated code execution to study images using a “Think, Act, Observe” loop. Google said this method helps reduce hallucinations and improves accuracy in visual tasks. “The model formulates plans to zoom in, inspect and manipulate images step-by-step, grounding answers in visual evidence,” the company said.
With Agentic Vision, the model can annotate images in real time. Instead of only describing what it sees, Gemini Flash 3 acts as an agent that runs Python code to visualise and verify its findings. This approach replaces what Google described as “probabilistic guessing” with code-based execution, delivering a reported 5–10% quality improvement.
Google explained that visual tasks involving multiple steps often cause errors in standard large language models. “Standard LLMs often hallucinate during multi-step visual arithmetic. Gemini 3 Flash bypasses this by offloading computation to a deterministic Python environment,” the company said. With this update, Google says its models are shifting from systems that only “look” at images to agents that actively “investigate” them.
The company also shared real-world use cases. One example highlighted was “PlanCheckSolver.com, an AI-powered building plan validation platform,” which improved accuracy by 5% by using Gemini 3 Flash with code execution to repeatedly inspect high-resolution visuals.
In another demonstration, “the model is asked to count the digits on a hand in the Gemini app. To avoid counting errors, it uses Python to draw bounding boxes and numeric labels over each finger it identifies.”
Agentic Vision is currently available to developers through the Gemini API in Google AI Studio and via Vertex AI within the Gemini app.
Google also outlined future plans for the feature. The company said it aims to let the model automatically decide when to rotate or zoom images, or perform visual calculations, without extra prompts. It also plans to add tools such as web search and reverse image search, and to expand Agentic Vision to larger and more powerful Gemini models beyond Flash.
Also read: Viksit Workforce for a Viksit Bharat
Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter
About us:
The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.



