Monday, February 16, 2026

Top 5 This Week

Related News

Global AI systems face challenges understanding Indian languages and dialects

As voice interfaces become central to digital services, a new evaluation highlights a significant gap in how artificial intelligence systems understand real-world speech across India’s diverse linguistic landscape.

A sovereign AI benchmark titled Voice of India has found that several widely used global AI models struggle with Indian languages, accents and dialects. The benchmark, developed by Josh Talks in collaboration with AI4Bharat at IIT Madras, evaluates automatic speech recognition (ASR) systems across 15 Indian languages using speech samples from more than 35,000 speakers.

Released as the India AI Impact Summit begins in New Delhi, the study reveals a clear performance difference between India-focused AI models and many global systems, especially for regional languages and dialects. AI4Bharat is expected to share deeper insights and updates on sovereign AI models during the summit.

What is Voice of India
Voice of India is designed to measure how accurately AI systems transcribe speech as it is actually spoken across India. The dataset includes conversational and spontaneous audio, background noise, code-mixed language and regional variations. It covers 15 languages with nearly 2,000 speakers per language and evaluates both widely spoken languages such as Hindi and Bengali and regional ones like Odia and Assamese. Dialect testing also includes variants such as Bhojpuri and Chhattisgarhi.

Key findings
Results show that Sarvam Audio, developed by Indian startup Sarvam AI, ranked 1st or 2nd across most languages and dialects. Google’s Gemini models performed closer to Indian systems, while several global models recorded higher error rates. In some cases, Sarvam’s model outperformed OpenAI’s GPT-4o transcription systems by more than 50 percentage points in average accuracy.

Performance differences were also seen across language families. Systems performed better on Indo-Aryan languages such as Hindi and Bengali, while error rates increased sharply for Dravidian languages including Tamil, Telugu, Malayalam and Kannada. Dialect testing showed error rates rising to 20–30% for Bhojpuri compared to under 10% for standard Hindi.

Why the benchmark matters
With voice increasingly used in banking, healthcare, customer support and government services, transcription errors can directly affect service delivery. Many global models, largely trained on Western or standardised datasets, still struggle with Indian accents, multilingual speech and regional variation.

“This is one of the most rigorous large-scale evaluations of speech recognition for Indian languages, containing district level cohorts with balanced representation across gender and age to truly reflect India’s diversity,” said Mitesh Khapra of AI4Bharat at IIT Madras. “Further, recognising that conventional word error rate can unfairly penalize code mixed and multilingual speech, we manually curated multiple valid spelling variants for transcripts, ensuring models are judged for linguistic correctness rather than orthographic variation. This human intensive effort sets a new benchmark for fair and representative ASR evaluation in India.”

AI4Bharat is a research initiative at IIT Madras focused on building open and inclusive AI systems, datasets and benchmarks for Indian languages.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream LinkedIn | The Mainstream Facebook | The Mainstream Youtube | The Mainstream Twitter

About us:

The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.

Popular Articles