You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/realtime-audio.md
+7-7Lines changed: 7 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ Most users of the Realtime API need to deliver and receive audio from an end-use
22
22
23
23
## Supported models
24
24
25
-
The GPT 4o realtime models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
25
+
The GPT 4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
26
26
-`gpt-4o-realtime-preview` (2024-12-17)
27
27
-`gpt-4o-realtime-preview` (2024-10-01)
28
28
@@ -167,14 +167,14 @@ In the same [`response.create`](../realtime-audio-reference.md#realtimeclienteve
167
167
}
168
168
```
169
169
170
-
When the server responds with a [`response.done`](../realtime-audio-reference.md#realtimeservereventresponsecreated) event, the response will contain the metadata you provided. You can identify the corresponding response for the client-sent event via the `response.metadata` field.
170
+
When the server responds with a [`response.done`](../realtime-audio-reference.md#realtimeservereventresponsecreated) event, the response contains the metadata you provided. You can identify the corresponding response for the client-sent event via the `response.metadata` field.
171
171
172
172
> [!IMPORTANT]
173
173
> If you create any responses outside the default conversation, be sure to always check the `response.metadata` field to help you identify the corresponding response for the client-sent event. You should even check the `response.metadata` field for responses that are part of the default conversation. That way, you can ensure that you're handling the correct response for the client-sent event.
174
174
175
175
### Custom context for out-of-band responses
176
176
177
-
You can also construct a custom context that the model will use outside of the session's default conversation. To create a response with custom context, set the `conversation` field to `none` and provide the custom context in the `input` array. The `input` array can contain new inputs or references to existing conversation items.
177
+
You can also construct a custom context that the model uses outside of the session's default conversation. To create a response with custom context, set the `conversation` field to `none` and provide the custom context in the `input` array. The `input` array can contain new inputs or references to existing conversation items.
178
178
179
179
```json
180
180
{
@@ -205,7 +205,7 @@ You can also construct a custom context that the model will use outside of the s
205
205
206
206
## Voice activity detection (VAD) and the audio buffer
207
207
208
-
The server maintains an input audio buffer containing client-provided audio that has not yet been committed to the conversation state.
208
+
The server maintains an input audio buffer containing client-provided audio that hasn't yet been committed to the conversation state.
209
209
210
210
One of the key [session-wide](#session-configuration) settings is `turn_detection`, which controls how data flow is handled between the caller and model. The `turn_detection` setting can be set to `none` or `server_vad` (to use [server-side voice activity detection](#server-decision-mode)).
211
211
@@ -266,9 +266,9 @@ sequenceDiagram
266
266
267
267
### VAD without automatic response generation
268
268
269
-
You can use server-side voice activity detection (VAD) without automatic response generation. This can be useful when you want to implement some degree of moderation.
269
+
You can use server-side voice activity detection (VAD) without automatic response generation. This approach can be useful when you want to implement some degree of moderation.
270
270
271
-
Set [`turn_detection.create_response`](../realtime-audio-reference.md#realtimeturndetection) to `false` via the [session.update](../realtime-audio-reference.md#realtimeclienteventsessionupdate) event. VAD will detect the end of speech but the server won't generate a response until you send a [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) event.
271
+
Set [`turn_detection.create_response`](../realtime-audio-reference.md#realtimeturndetection) to `false` via the [session.update](../realtime-audio-reference.md#realtimeclienteventsessionupdate) event. VAD detects the end of speech but the server doesn't generate a response until you send a [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) event.
272
272
273
273
```json
274
274
{
@@ -284,7 +284,7 @@ Set [`turn_detection.create_response`](../realtime-audio-reference.md#realtimetu
284
284
285
285
## Conversation and response generation
286
286
287
-
The Realtime API is designed to handle real-time, low-latency conversational interactions. The API is built on a series of events that allow the client to send and receive messages, control the flow of the conversation, and manage the state of the session.
287
+
The GPT-4o real-time audio models are designed for real-time, low-latency conversational interactions. The API is built on a series of events that allow the client to send and receive messages, control the flow of the conversation, and manage the state of the session.
0 commit comments