Skip to content

Commit f7a143d

Browse files
authored
Update OpenAI Realtime model (#683)
* updates * updates
1 parent 1c10ea5 commit f7a143d

File tree

1 file changed

+379
-7
lines changed

1 file changed

+379
-7
lines changed

fern/openai-realtime.mdx

Lines changed: 379 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,388 @@
11
---
22
title: OpenAI Realtime
3-
subtitle: You can use OpenAI's newest speech-to-speech model with your Vapi assistants.
3+
subtitle: Build voice assistants with OpenAI's native speech-to-speech models for ultra-low latency conversations
44
slug: openai-realtime
55
---
66

7+
## Overview
8+
9+
OpenAI’s Realtime API enables developers to use a native speech-to-speech model. Unlike other Vapi configurations which orchestrate a transcriber, model and voice API to simulate speech-to-speech, OpenAI’s Realtime API natively processes audio in and audio out.
10+
11+
**In this guide, you'll learn to:**
12+
- Choose the right realtime model for your use case
13+
- Configure voice assistants with realtime capabilities
14+
- Implement best practices for production deployments
15+
- Optimize prompts specifically for realtime models
16+
17+
## Available models
18+
19+
<Tip>
20+
The `gpt-realtime-2025-08-28` model is production-ready.
21+
</Tip>
22+
23+
OpenAI offers three realtime models, each with different capabilities and cost/performance trade-offs:
24+
25+
| Model | Status | Best For | Key Features |
26+
|-------|---------|----------|--------------|
27+
| `gpt-realtime-2025-08-28` | **Production** | Production workloads | Production Ready |
28+
| `gpt-4o-realtime-preview-2024-12-17` | Preview | Development & testing | Balanced performance/cost |
29+
| `gpt-4o-mini-realtime-preview-2024-12-17` | Preview | Cost-sensitive apps | Lower latency, reduced cost |
30+
31+
## Voice options
32+
33+
Realtime models support a specific set of OpenAI voices optimized for speech-to-speech:
34+
35+
<CardGroup cols={2}>
36+
<Card title="Standard Voices" icon="microphone">
37+
Available across all realtime models:
38+
- `alloy` - Neutral and balanced
39+
- `echo` - Warm and engaging
40+
- `shimmer` - Energetic and expressive
41+
</Card>
42+
<Card title="Realtime-Exclusive Voices" icon="sparkles">
43+
Only available with realtime models:
44+
- `marin` - Professional and clear
45+
- `cedar` - Natural and conversational
46+
</Card>
47+
</CardGroup>
48+
49+
<Warning>
50+
The following voices are **NOT** supported by realtime models: ash, ballad, coral, fable, onyx, and nova.
51+
</Warning>
52+
53+
## Configuration
54+
55+
### Basic setup
56+
57+
Configure a realtime assistant with function calling:
58+
59+
<CodeBlocks>
60+
```json title="Assistant Configuration"
61+
{
62+
"model": {
63+
"provider": "openai",
64+
"model": "gpt-realtime-2025-08-28",
65+
"messages": [
66+
{
67+
"role": "system",
68+
"content": "You are a helpful assistant. Be concise and friendly."
69+
}
70+
],
71+
"temperature": 0.7,
72+
"maxTokens": 250,
73+
"tools": [
74+
{
75+
"type": "function",
76+
"function": {
77+
"name": "getWeather",
78+
"description": "Get the current weather",
79+
"parameters": {
80+
"type": "object",
81+
"properties": {
82+
"location": {
83+
"type": "string",
84+
"description": "The city name"
85+
}
86+
},
87+
"required": ["location"]
88+
}
89+
}
90+
}
91+
]
92+
},
93+
"voice": {
94+
"provider": "openai",
95+
"voiceId": "alloy"
96+
}
97+
}
98+
```
99+
```typescript title="TypeScript SDK"
100+
import { Vapi } from '@vapi-ai/server-sdk';
101+
102+
const vapi = new Vapi({ token: process.env.VAPI_API_KEY });
103+
104+
const assistant = await vapi.assistants.create({
105+
model: {
106+
provider: "openai",
107+
model: "gpt-realtime-2025-08-28",
108+
messages: [{
109+
role: "system",
110+
content: "You are a helpful assistant. Be concise and friendly."
111+
}],
112+
temperature: 0.7,
113+
maxTokens: 250,
114+
tools: [{
115+
type: "function",
116+
function: {
117+
name: "getWeather",
118+
description: "Get the current weather",
119+
parameters: {
120+
type: "object",
121+
properties: {
122+
location: {
123+
type: "string",
124+
description: "The city name"
125+
}
126+
},
127+
required: ["location"]
128+
}
129+
}
130+
}]
131+
},
132+
voice: {
133+
provider: "openai",
134+
voiceId: "alloy"
135+
}
136+
});
137+
```
138+
```python title="Python SDK"
139+
from vapi import Vapi
140+
141+
vapi = Vapi(token=os.getenv("VAPI_API_KEY"))
142+
143+
assistant = vapi.assistants.create(
144+
model={
145+
"provider": "openai",
146+
"model": "gpt-realtime-2025-08-28",
147+
"messages": [{
148+
"role": "system",
149+
"content": "You are a helpful assistant. Be concise and friendly."
150+
}],
151+
"temperature": 0.7,
152+
"maxTokens": 250,
153+
"tools": [{
154+
"type": "function",
155+
"function": {
156+
"name": "getWeather",
157+
"description": "Get the current weather",
158+
"parameters": {
159+
"type": "object",
160+
"properties": {
161+
"location": {
162+
"type": "string",
163+
"description": "The city name"
164+
}
165+
},
166+
"required": ["location"]
167+
}
168+
}
169+
}]
170+
},
171+
voice={
172+
"provider": "openai",
173+
"voiceId": "alloy"
174+
}
175+
)
176+
```
177+
</CodeBlocks>
178+
179+
### Using realtime-exclusive voices
180+
181+
To use the enhanced voices only available with realtime models:
182+
183+
```json
184+
{
185+
"voice": {
186+
"provider": "openai",
187+
"voiceId": "marin" // or "cedar"
188+
}
189+
}
190+
```
191+
192+
### Handling instructions
193+
194+
<Info>
195+
Unlike traditional OpenAI models, realtime models receive instructions through the session configuration. Vapi automatically converts your system messages to session instructions during WebSocket initialization.
196+
</Info>
197+
198+
The system message in your model configuration is automatically optimized for realtime processing:
199+
200+
1. System messages are converted to session instructions
201+
2. Instructions are sent during WebSocket session initialization
202+
3. The instructions field supports the same prompting strategies as system messages
203+
204+
## Prompting best practices
205+
7206
<Note>
8-
The Realtime API is currently in beta, and not recommended for production use by OpenAI. We're excited to have you try this new feature and welcome your [feedback](https://discord.com/invite/pUFNcf2WmH) as we continue to refine and improve the experience.
207+
Realtime models benefit from different prompting techniques than text-based models. These guidelines are based on [OpenAI's official prompting guide](https://cookbook.openai.com/examples/realtime_prompting_guide).
9208
</Note>
10209

11-
OpenAI’s Realtime API enables developers to use a native speech-to-speech model. Unlike other Vapi configurations which orchestrate a transcriber, model and voice API to simulate speech-to-speech, OpenAI’s Realtime API natively processes audio in and audio out.
210+
### General tips
211+
212+
- **Iterate relentlessly**: Small wording changes can significantly impact behavior
213+
- **Use bullet points over paragraphs**: Clear, short bullets outperform long text blocks
214+
- **Guide with examples**: The model closely follows sample phrases you provide
215+
- **Be precise**: Ambiguity or conflicting instructions degrade performance
216+
- **Control language**: Pin output to a target language to prevent unwanted switching
217+
- **Reduce repetition**: Add variety rules to avoid robotic phrasing
218+
- **Capitalize for emphasis**: Use CAPS for key rules to make them stand out
219+
220+
### Prompt structure
221+
222+
Organize your prompts with clear sections for better model comprehension:
223+
224+
```
225+
# Role & Objective
226+
You are a customer service agent for Acme Corp. Your goal is to resolve issues quickly.
227+
228+
# Personality & Tone
229+
- Friendly, professional, and empathetic
230+
- Speak naturally at a moderate pace
231+
- Keep responses to 2-3 sentences
232+
233+
# Instructions
234+
- Greet callers warmly
235+
- Ask clarifying questions before offering solutions
236+
- Always confirm understanding before proceeding
237+
238+
# Tools
239+
Use the available tools to look up account information and process requests.
240+
241+
# Safety
242+
If a caller becomes aggressive or requests something outside your scope,
243+
politely offer to transfer them to a specialist.
244+
```
245+
246+
### Realtime-specific techniques
247+
248+
<Tabs>
249+
<Tab title="Speaking Speed">
250+
Control the model's speaking pace with explicit instructions:
251+
252+
```
253+
## Pacing
254+
- Deliver responses at a natural, conversational speed
255+
- Do not rush through information
256+
- Pause briefly between key points
257+
```
258+
</Tab>
259+
<Tab title="Personality">
260+
Realtime models excel at maintaining consistent personality:
261+
262+
```
263+
## Personality
264+
- Warm and approachable like a trusted advisor
265+
- Professional but not robotic
266+
- Show genuine interest in helping
267+
```
268+
</Tab>
269+
<Tab title="Conversation Flow">
270+
Guide natural conversation progression:
271+
272+
```
273+
## Conversation Flow
274+
1. Greeting: Welcome caller and ask how you can help
275+
2. Discovery: Understand their specific needs
276+
3. Solution: Offer the best available option
277+
4. Confirmation: Ensure they're satisfied before ending
278+
```
279+
</Tab>
280+
</Tabs>
281+
282+
## Migration guide
283+
284+
Transitioning from standard STT/TTS to realtime models:
285+
286+
<Steps>
287+
<Step title="Update your model configuration">
288+
Change your model to one of the realtime options:
289+
```json
290+
{
291+
"model": {
292+
"provider": "openai",
293+
"model": "gpt-realtime-2025-08-28" // Changed from gpt-4
294+
}
295+
}
296+
```
297+
</Step>
298+
299+
<Step title="Verify voice compatibility">
300+
Ensure your selected voice is supported (alloy, echo, shimmer, marin, or cedar)
301+
</Step>
302+
303+
<Step title="Remove transcriber configuration">
304+
Realtime models handle speech-to-speech natively, so transcriber settings are not needed
305+
</Step>
306+
307+
<Step title="Test function calling">
308+
Your existing function configurations work unchanged with realtime models
309+
</Step>
310+
311+
<Step title="Optimize your prompts">
312+
Apply realtime-specific prompting techniques for best results
313+
</Step>
314+
</Steps>
315+
316+
## Best practices
317+
318+
### Model selection strategy
319+
320+
<AccordionGroup>
321+
<Accordion title="When to use gpt-realtime-2025-08-28">
322+
**Best for production workloads requiring:**
323+
- Structured outputs for form filling or data collection
324+
- Complex function orchestration
325+
- Highest quality voice interactions
326+
- Responses API integration
327+
</Accordion>
328+
329+
<Accordion title="When to use gpt-4o-realtime-preview">
330+
**Best for development and testing:**
331+
- Prototyping voice applications
332+
- Balanced cost/performance during development
333+
- Testing conversation flows before production
334+
</Accordion>
335+
336+
<Accordion title="When to use gpt-4o-mini-realtime-preview">
337+
**Best for cost-sensitive applications:**
338+
- High-volume voice interactions
339+
- Simple Q&A or routing scenarios
340+
- Applications where latency is critical
341+
</Accordion>
342+
</AccordionGroup>
343+
344+
### Performance optimization
345+
346+
- **Temperature settings**: Use 0.5-0.7 for consistent yet natural responses
347+
- **Max tokens**: Set appropriate limits (200-300) for conversational responses
348+
- **Voice selection**: Test different voices to match your brand personality
349+
- **Function design**: Keep function schemas simple for faster execution
350+
351+
### Error handling
352+
353+
Handle edge cases gracefully:
354+
355+
```json
356+
{
357+
"messages": [{
358+
"role": "system",
359+
"content": "If you don't understand the user, politely ask them to repeat. Never make assumptions about unclear requests."
360+
}]
361+
}
362+
```
363+
364+
## Current limitations
365+
366+
<Warning>
367+
Be aware of these limitations when implementing realtime models:
368+
</Warning>
369+
370+
- **Knowledge Bases** are not currently supported with the Realtime API
371+
- **Endpointing and Interruption** models are managed by Vapi's orchestration layer
372+
- **Custom voice cloning** is not available for realtime models
373+
- **Some OpenAI voices** (ash, ballad, coral, fable, onyx, nova) are incompatible
374+
- **Transcripts** may have slight differences from traditional STT output
375+
376+
## Additional resources
377+
378+
- [OpenAI Realtime Documentation](https://platform.openai.com/docs/guides/realtime)
379+
- [Realtime Prompting Guide](https://platform.openai.com/docs/guides/realtime-models-prompting)
380+
- [Prompting Cookbook](https://cookbook.openai.com/examples/realtime_prompting_guide)
381+
- [Vapi Discord Community](https://discord.com/invite/pUFNcf2WmH)
382+
383+
## Next steps
12384

13-
To start using it with your Vapi assistants, select `gpt-4o-realtime-preview-2024-12-17` as your model.
14-
- Please note that only OpenAI voices may be selected while using this model. The voice selection will not act as a TTS (text-to-speech) model, but rather as the voice used within the speech-to-speech model.
15-
- Also note that we don’t currently support Knowledge Bases with the Realtime API.
16-
- Lastly, note that our Realtime integration still retains the rest of Vapi's orchestration layer such as Endpointing and Interruption models to enable a reliable conversational flow.
385+
Now that you understand OpenAI Realtime models:
386+
- **[Phone Calling Guide](/phone-calling):** Set up inbound and outbound calling
387+
- **[Assistant Hooks](/assistants/assistant-hooks):** Add custom logic to your conversations
388+
- **[Voice Providers](/providers/voice/openai):** Explore other voice options

0 commit comments

Comments
 (0)