Amid growing global competition in artificial intelligence, an Indian startup is quietly drawing attention for its strong focus on local needs and execution.
Sarvam AI, based in Bengaluru, has announced that its latest models, Sarvam Vision and Bulbul V3, have delivered stronger performance than leading global AI systems such as Google Gemini and OpenAI’s ChatGPT on India-specific tasks. The update was shared on X by Pratyush Kumar, who revealed the release of a state-space-based 3-billion-parameter vision-language model designed for high-accuracy digitisation in English and Indian languages.
The new model expands Sarvam AI’s work beyond text and voice into visual understanding. Its primary focus is document intelligence, covering physical documents, archives, and manuscripts, with a strong emphasis on Indian languages. The system has been trained on high-quality datasets spanning 22 official Indian languages, including financial records, literature, newspapers, and historical texts.
Sarvam AI has positioned its approach around practical use rather than hype. By combining local language expertise with global benchmarking, the startup aims to address long-standing gaps in India-focused AI capabilities.
To encourage adoption, the company has made its Document Intelligence API free for February 2026. This allows developers and enterprises to build and test applications using Sarvam Vision at scale at no cost during this period.
Key features of Sarvam AI
Sarvam AI highlights several core capabilities. Its multimodal vision-language system understands images and text together, enabling image captioning and chart or table interpretation. The platform offers document understanding with high-accuracy OCR and knowledge extraction for 22 Indian languages, including scanned and historical documents.
The model can also analyse charts, illustrations, and complex layouts, going beyond plain text extraction. It supports multilingual visual understanding, allowing interpretation of documents that use multiple languages in a single file. The company has also introduced the Sarvam Indic OCR Bench to measure Indian language performance.
In terms of accuracy, Sarvam Vision reportedly scored 84.3% on olmOCR-Bench, outperforming Gemini 3 Pro and DeepSeek OCR v2. On OmniDocBench v1.5, it achieved 93.28%, according to the official Sarvam blog.
Sarvam AI describes itself as a “sovereign” AI initiative, aiming to build accessible and reliable AI systems controlled within India. Unlike many global models that prioritise English, Sarvam AI is trained deeply across Indian languages, enabling stronger performance on region-specific documents and use cases.
Also read: Viksit Workforce for a Viksit Bharat
Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter
About us:
The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.



