GPT-Realtime voice model is now available, supporting image input #6
CometAPI-Official
started this conversation in
Blog
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
OpenAI today announced that GPT-Realtime voice model is now available, supporting image input, marking the Realtime API’s move from beta to general availability for production voice agents. The release positions GPT-Realtime as a low-latency, speech-to-speech model that can run two-way voice conversations while also grounding responses in images supplied during a session.
OpenAI describes gpt-realtime as its most advanced speech-to-speech model to date: it processes audio end-to-end (rather than chaining separate speech-to-text and text-to-speech steps), produces more natural and expressive speech, and shows measurable gains in comprehension, instruction following, and function calling. The company highlights improvements on internal benchmarks and says the model captures subtleties such as laughter, mid-sentence language switching, and higher accuracy on alphanumeric content.
What’s new
Benchmarks
BigBench Audio (reasoning): 82.8% — up from 65.6% on OpenAI’s December 2024 realtime model. This is the headline reasoning benchmark reported for audio-capable reasoning tasks.
MultiChallenge (instruction following, audio): ~30.5% vs ~20.6% previously — shows improved adherence to multi-step or complex spoken instructions.
ComplexFuncBench (function-calling success): ~66.5% vs ~49.7% previously — better reliability when the model must call tools/functions during an audio session.
Cost & latency: OpenAI states the new model reduces per-token audio cost (≈20% lower than the prior realtime preview) and operates as a single end-to-end model (no separate STT → LM → TTS chain), which lowers end-to-end latency in real-time interactive flows.
OpenAI says the
gpt-realtime
model demonstrates material improvements in a range of objective benchmarks and real-world behaviors — higher scores on BigBench Audio and on instruction-following/function-calling evaluations — and better handling of alphanumerics, codewords and language switching in live audio. The company also introduced two new voices (Cedar and Marin) and reports a 20% price reduction compared with the earlier realtime preview model.The Realtime API and
gpt-realtime
model are now available to developers (GA),OpenAI also lowered the price of its Realtime API with this update, reducing audio input to $32 per million tokens and audio output to $64 per million tokens, a 20% reduction from the previous price, providing developers with a more economical solution.Getting Started
CometAPI is a unified API platform that aggregates over 500 AI models from leading providers—such as OpenAI’s GPT series, Google’s Gemini, Anthropic’s Claude, Midjourney, Suno, and more—into a single, developer-friendly interface. By offering consistent authentication, request formatting, and response handling, CometAPI dramatically simplifies the integration of AI capabilities into your applications. Whether you’re building chatbots, image generators, music composers, or data‐driven analytics pipelines, CometAPI lets you iterate faster, control costs, and remain vendor-agnostic—all while tapping into the latest breakthroughs across the AI ecosystem.
Developers can access [GPT-5](https://www.cometapi.com/gpt-5-api/) through CometAPI, the latest models version listed are as of the article’s publication date. To begin, explore the model’s capabilities in the [Playground](https://api.cometapi.com/chat) and consult the [API guide](https://api.cometapi.com/doc) for detailed instructions. Before accessing, please make sure you have logged in to CometAPI and obtained the API key. [CometAPI](https://www.cometapi.com/) offer a price far lower than the official price to help you integrate.
The latest integration *
gpt-realtime
* will soon appear on CometAPI, so stay tuned!Ready to Get Started editing images? → [Sign up for CometAPI today](https://www.chatbase.co/auth/signup) !Beta Was this translation helpful? Give feedback.
All reactions