Monday, February 2, 2026

Top 5 This Week

Related News

Sarvam AI rolls out new speech model for fast multilingual video dubbing

Aiming to remove language barriers at scale, Indian AI startup Sarvam AI has unveiled a new speech model that promises to turn multilingual video dubbing into a process that takes minutes instead of weeks. The company says the technology is built to help creators, educators and broadcasters reach wider audiences across Indian languages.

Sarvam AI introduced Sarvam Dub on February 1, positioning it as an advanced artificial intelligence system that can translate audio into multiple languages while preserving the original speaker’s voice. The model also includes built-in controls to closely match the timing of the original video, a key requirement for professional-quality dubbing.

“We’re introducing Sarvam Dub, a state-of-the-art AI dubbing model that helps creators extend the life and reach of a single piece of content, quickly,” the company said in its announcement.

The startup said traditional dubbing relies on translators, voice artists and studio time, a workflow that “worked, but couldn’t scale.” With Sarvam Dub, the company claims that “what used to take weeks of scripting, recording, studio time, and publishing effort can now be dubbed in minutes.”

The system uses zero-shot voice cloning and cross-lingual speech models to keep a speaker’s identity consistent across languages. This is a major challenge in India, where content often needs to move across many regional languages and accents. Sarvam said its model builds duration control directly into speech generation, instead of adjusting audio after it is produced, which often makes voices sound unnatural.

“High-quality dubbing requires duration control that is intrinsic to speech generation, where timing is shaped as the voice is produced rather than adjusted afterwards,” Sarvam said.

To measure performance, the company evaluated more than 700 audio samples from 64 speakers across 10 Indian languages and English. It used speaker-similarity scores based on ECAPA-TDNN embeddings and said the model delivered state-of-the-art results, especially in cross-lingual voice preservation.

Sarvam Dub is already being tested in public communication, education and broadcast use cases. The company worked with the Indian Institute of Technology Madras to dub technical lectures into multiple languages. It also said India’s Union Budget 2026 became the first national budget to be dubbed live using AI, with Finance Minister Nirmala Sitharaman’s speech streamed in Kannada and Hindi.

Live dubbing adds pressure on speed and accuracy. Sarvam said its engineers achieved a 6.6-times reduction in latency by optimising model tracing, using selective and post-training quantisation, and adding intelligent caching, making the system suitable for real-time broadcast.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream formerly known as CIO News LinkedIn Account | The Mainstream formerly known as CIO News Facebook | The Mainstream formerly known as CIO News Youtube | The Mainstream formerly known as CIO News Twitter

About us:

The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.

Popular Articles