OpenAI unveils new real-time voice AI models for developers

0
1
OpenAI expands developer tools with next-generation real-time voice AI models
OpenAI expands developer tools with next-generation real-time voice AI models

As voice-based interactions become a major part of digital experiences, OpenAI has launched a new suite of real-time voice intelligence models aimed at making AI conversations more natural, responsive, and action-oriented.

The new lineup includes GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper. These models are designed to help developers build AI systems that can listen, reason, translate, transcribe, and respond during live conversations.

According to the company, “Voice is becoming one of the most natural ways for people to use software. A voice agent needs to understand what someone means, keep track of context, recover when a request changes, use tools while the conversation continues, and respond in a way that feels appropriate to the moment. Together, the models we are launching move realtime audio from simple call-and-response toward voice interfaces that can actually do work: listen, reason, translate, transcribe, and take action as a conversation unfolds.”

The company said GPT-Realtime-2 offers GPT-5-level reasoning capabilities and supports a 128K context window for longer and more coherent conversations. GPT-Realtime-Translate enables live multilingual conversations, while GPT-Realtime-Whisper provides instant streaming speech-to-text transcription.

The models also introduce advanced voice-to-action capabilities, allowing users to describe tasks verbally while the system processes and completes requests in real time. OpenAI added that software platforms can now proactively deliver spoken updates, such as travel notifications or customer support responses.

The company revealed that businesses including Zillow, Deutsche Telekom, and Vimeo are already testing the technology for customer engagement and multilingual communication use cases.

OpenAI stated that GPT-Realtime-2 scored 15.2% higher on Big Bench Audio benchmarks and 13.8% higher on Audio MultiChallenge tests compared to earlier versions. Developers can also customise tone settings for empathetic, calm, or upbeat responses depending on the context.

To improve safety, the company has added active classifiers that can detect harmful content and stop sessions when required. The models also support enterprise-grade privacy controls and EU data residency compliance.

The new models are now available through OpenAI’s Realtime API. GPT-Realtime-2 is priced at $32 per 1M audio input tokens and $64 per 1M audio output tokens, while GPT-Realtime-Translate costs $0.034 per minute and GPT-Realtime-Whisper costs $0.017 per minute.

Also read: Viksit Workforce for a Viksit Bharat

Do Follow: The Mainstream LinkedIn | The Mainstream Facebook | The Mainstream Youtube | The Mainstream Twitter

About us:

The Mainstream is a premier platform delivering the latest updates and informed perspectives across the technology business and cyber landscape. Built on research-driven, thought leadership and original intellectual property, The Mainstream also curates summits & conferences that convene decision makers to explore how technology reshapes industries and leadership. With a growing presence in India and globally across the Middle East, Africa, ASEAN, the USA, the UK and Australia, The Mainstream carries a vision to bring the latest happenings and insights to 8.2 billion people and to place technology at the centre of conversation for leaders navigating the future.