You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/concepts/audio.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ For information about the available audio models per region in Azure OpenAI, see
22
22
23
23
## GPT-4o audio Realtime API
24
24
25
-
GPT-4o real-time audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user. For more information on how to use GPT-4o real-time audio, see the [GPT-4o real-time audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
25
+
GPT real-time audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user. For more information on how to use GPT real-time audio, see the [GPT real-time audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
26
26
27
27
## GPT-4o audio completions
28
28
@@ -40,4 +40,4 @@ The audio models via the `/audio` API can be used for speech to text, translatio
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/how-to/realtime-audio-webrtc.md
+9-9Lines changed: 9 additions & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: 'How to use the GPT-4o Realtime API via WebRTC'
2
+
title: 'How to use the GPT Realtime API via WebRTC'
3
3
titleSuffix: Azure OpenAI in Azure AI Foundry Models
4
-
description: Learn how to use the GPT-4o Realtime API for speech and audio via WebRTC.
4
+
description: Learn how to use the GPT Realtime API for speech and audio via WebRTC.
5
5
manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.topic: how-to
@@ -12,10 +12,10 @@ ms.custom: references_regions
12
12
recommendations: false
13
13
---
14
14
15
-
# How to use the GPT-4o Realtime API via WebRTC
15
+
# How to use the GPT Realtime API via WebRTC
16
16
17
17
18
-
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
18
+
Azure OpenAI GPT Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
19
19
20
20
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. Follow the instructions in this article to get started with the Realtime API via WebRTC.
21
21
@@ -29,7 +29,7 @@ Use the [Realtime API via WebSockets](./realtime-audio-websockets.md) if you nee
29
29
30
30
## Supported models
31
31
32
-
The GPT 4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
32
+
The GPT real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
33
33
-`gpt-4o-mini-realtime-preview` (2024-12-17)
34
34
-`gpt-4o-realtime-preview` (2024-12-17)
35
35
-`gpt-realtime` (version 2025-08-28)
@@ -40,7 +40,7 @@ For more information about supported models, see the [models and versions docume
40
40
41
41
## Prerequisites
42
42
43
-
Before you can use GPT-4o real-time audio, you need:
43
+
Before you can use GPT real-time audio, you need:
44
44
45
45
- An Azure subscription - <ahref="https://azure.microsoft.com/free/cognitive-services"target="_blank">Create one for free</a>.
46
46
- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
@@ -113,9 +113,9 @@ sequenceDiagram
113
113
114
114
## WebRTC example via HTML and JavaScript
115
115
116
-
The following code sample demonstrates how to use the GPT-4o Realtime API via WebRTC. The sample uses the [WebRTC API](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API) to establish a real-time audio connection with the model.
116
+
The following code sample demonstrates how to use the GPT Realtime API via WebRTC. The sample uses the [WebRTC API](https://developer.mozilla.org/en-US/docs/Web/API/WebRTC_API) to establish a real-time audio connection with the model.
117
117
118
-
The sample code is an HTML page that allows you to start a session with the GPT-4o Realtime API and send audio input to the model. The model's responses are played back in real-time.
118
+
The sample code is an HTML page that allows you to start a session with the GPT Realtime API and send audio input to the model. The model's responses are played back in real-time.
119
119
120
120
> [!WARNING]
121
121
> The sample code includes the API key hardcoded in the JavaScript. This code isn't recommended for production use. In a production environment, you should use a secure backend service to generate an ephemeral key and return it to the client.
@@ -299,7 +299,7 @@ The sample code is an HTML page that allows you to start a session with the GPT-
299
299
</html>
300
300
```
301
301
302
-
1. Select **Start Session** to start a session with the GPT-4o Realtime API. The session ID and ephemeral key are displayed in the log container.
302
+
1. Select **Start Session** to start a session with the GPT Realtime API. The session ID and ephemeral key are displayed in the log container.
303
303
1. Allow the browser to access your microphone when prompted.
304
304
1. Confirmation messages are displayed in the log container as the session progresses. Here's an example of the log messages:
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/how-to/realtime-audio-websockets.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: 'How to use the GPT-4o Realtime API via WebSockets'
2
+
title: 'How to use the GPT Realtime API via WebSockets'
3
3
titleSuffix: Azure OpenAI in Azure AI Foundry Models
4
-
description: Learn how to use the GPT-4o Realtime API for speech and audio via WebSockets.
4
+
description: Learn how to use the GPT Realtime API for speech and audio via WebSockets.
5
5
manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.topic: how-to
@@ -12,10 +12,10 @@ ms.custom: references_regions
12
12
recommendations: false
13
13
---
14
14
15
-
# How to use the GPT-4o Realtime API via WebSockets
15
+
# How to use the GPT Realtime API via WebSockets
16
16
17
17
18
-
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
18
+
Azure OpenAI GPT Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
19
19
20
20
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time.
21
21
@@ -26,7 +26,7 @@ Follow the instructions in this article to get started with the Realtime API via
26
26
27
27
## Supported models
28
28
29
-
The GPT-4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
29
+
The GPT real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
30
30
-`gpt-4o-mini-realtime-preview` (2024-12-17)
31
31
-`gpt-4o-realtime-preview` (2024-12-17)
32
32
-`gpt-realtime` (version 2025-08-28)
@@ -37,7 +37,7 @@ For more information about supported models, see the [models and versions docume
37
37
38
38
## Prerequisites
39
39
40
-
Before you can use GPT-4o real-time audio, you need:
40
+
Before you can use GPT real-time audio, you need:
41
41
42
42
- An Azure subscription - <ahref="https://azure.microsoft.com/free/cognitive-services"target="_blank">Create one for free</a>.
43
43
- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/how-to/realtime-audio.md
+61-10Lines changed: 61 additions & 10 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
1
---
2
-
title: 'How to use the GPT-4o Realtime API for speech and audio with Azure OpenAI'
2
+
title: 'How to use the GPT Realtime API for speech and audio with Azure OpenAI'
3
3
titleSuffix: Azure OpenAI in Azure AI Foundry Models
4
-
description: Learn how to use the GPT-4o Realtime API for speech and audio with Azure OpenAI.
4
+
description: Learn how to use the GPT Realtime API for speech and audio with Azure OpenAI.
5
5
manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.topic: how-to
@@ -12,9 +12,9 @@ ms.custom: references_regions
12
12
recommendations: false
13
13
---
14
14
15
-
# How to use the GPT-4o Realtime API for speech and audio
15
+
# How to use the GPT Realtime API for speech and audio
16
16
17
-
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o Realtime API is designed to handle real-time, low-latency conversational interactions. Realtime API is a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
17
+
Azure OpenAI GPT Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT Realtime API is designed to handle real-time, low-latency conversational interactions. Realtime API is a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
18
18
19
19
Most users of the Realtime API need to deliver and receive audio from an end-user in real time, including applications that use WebRTC or a telephony system. The Realtime API isn't designed to connect directly to end user devices and relies on client integrations to terminate end user audio streams.
20
20
@@ -24,7 +24,7 @@ You can use the Realtime API via WebRTC or WebSocket to send audio input to the
24
24
25
25
## Supported models
26
26
27
-
The GPT 4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
27
+
The GPT real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
28
28
-`gpt-4o-mini-realtime-preview` (2024-12-17)
29
29
-`gpt-4o-realtime-preview` (2024-12-17)
30
30
-`gpt-realtime` (version 2025-08-28)
@@ -35,16 +35,16 @@ See the [models and versions documentation](../concepts/models.md#audio-models)
35
35
36
36
## Get started
37
37
38
-
Before you can use GPT-4o real-time audio, you need:
38
+
Before you can use GPT real-time audio, you need:
39
39
40
40
- An Azure subscription - <ahref="https://azure.microsoft.com/free/cognitive-services"target="_blank">Create one for free</a>.
41
41
- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
42
42
- You need a deployment of the `gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`, or `gpt-realtime` model in a supported region as described in the [supported models](#supported-models) section. You can deploy the model from the [Azure AI Foundry portal model catalog](../../../ai-foundry/how-to/model-catalog-overview.md) or from your project in Azure AI Foundry portal.
43
43
44
-
Here are some of the ways you can get started with the GPT-4o Realtime API for speech and audio:
44
+
Here are some of the ways you can get started with the GPT Realtime API for speech and audio:
45
45
- For steps to deploy and use the `gpt-4o-realtime-preview`, `gpt-4o-mini-realtime-preview`, or `gpt-realtime` model, see [the real-time audio quickstart](../realtime-audio-quickstart.md).
46
46
- Try the [WebRTC via HTML and JavaScript example](./realtime-audio-webrtc.md#webrtc-example-via-html-and-javascript) to get started with the Realtime API via WebRTC.
47
-
-[The Azure-Samples/aisearch-openai-rag-audio repo](https://github.com/Azure-Samples/aisearch-openai-rag-audio) contains an example of how to implement RAG support in applications that use voice as their user interface, powered by the GPT-4o realtime API for audio.
47
+
-[The Azure-Samples/aisearch-openai-rag-audio repo](https://github.com/Azure-Samples/aisearch-openai-rag-audio) contains an example of how to implement RAG support in applications that use voice as their user interface, powered by the GPT realtime API for audio.
48
48
49
49
## Session configuration
50
50
@@ -229,7 +229,7 @@ Set [`turn_detection.create_response`](../realtime-audio-reference.md#realtimetu
229
229
230
230
## Conversation and response generation
231
231
232
-
The GPT-4o real-time audio models are designed for real-time, low-latency conversational interactions. The API is built on a series of events that allow the client to send and receive messages, control the flow of the conversation, and manage the state of the session.
232
+
The GPT real-time audio models are designed for real-time, low-latency conversational interactions. The API is built on a series of events that allow the client to send and receive messages, control the flow of the conversation, and manage the state of the session.
233
233
234
234
### Conversation sequence and items
235
235
@@ -278,7 +278,58 @@ A user might want to interrupt the assistant's response or ask the assistant to
278
278
- Truncating audio deletes the server-side text transcript to ensure there isn't text in the context that the user doesn't know about.
279
279
- The server responds with a [`conversation.item.truncated`](../realtime-audio-reference.md#realtimeservereventconversationitemtruncated) event.
280
280
281
-
## Text in audio out example
281
+
## Image input
282
+
283
+
The `gpt-realtime` model supports image input as part of the conversation. The model can ground responses in what the user is currently seeing. You can send images to the model as part of a conversation item. The model can then generate responses that reference the images.
284
+
285
+
The following example json body adds an image to the conversation:
To enable MCP support in a Realtime API session, provide the URL of a remote MCP server in your session configuration. After connecting, the API will automatically manage tool calls on your behalf.
307
+
308
+
You can easily enhance your agent's functionality by specifying a different MCP server in the session configuration—any tools available on that server will be accessible immediately.
309
+
310
+
The following example json body sets up an MCP server:
311
+
312
+
```json
313
+
{
314
+
"session": {
315
+
"type": "realtime",
316
+
"tools": [
317
+
{
318
+
"type": "mcp",
319
+
"server_label": "stripe",
320
+
"server_url": "https://mcp.stripe.com",
321
+
"authorization": "{access_token}",
322
+
"require_approval": "never"
323
+
}
324
+
]
325
+
}
326
+
}
327
+
```
328
+
329
+
330
+
331
+
332
+
## Text-in, audio-out example
282
333
283
334
Here's an example of the event sequence for a simple text-in, audio-out conversation:
To chat with your deployed `gpt-realtime` model in the [Azure AI Foundry](https://ai.azure.com/?cid=learnDocs)**Real-time audio** playground, follow these steps:
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
19
+
Azure OpenAI GPT Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
20
20
21
21
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time.
22
22
@@ -27,7 +27,7 @@ Follow the instructions in this article to get started with the Realtime API via
27
27
28
28
## Supported models
29
29
30
-
The GPT 4o real-time models are available for global deployments.
30
+
The GPT real-time models are available for global deployments.
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/whats-new.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,7 +22,7 @@ This article provides a summary of the latest releases and major documentation u
22
22
23
23
### Realtime API audio model GA
24
24
25
-
OpenAI's GPT-4o RealTime and Audio models are now generally available on Azure AI Foundry Direct Models.
25
+
OpenAI's GPT RealTime and Audio models are now generally available on Azure AI Foundry Direct Models.
26
26
27
27
Model improvements:
28
28
- Improved instruction following: Enhanced capabilities to follow tone, pacing, and escalation instructions more accurately and reliably. Can also switch languages.
@@ -210,15 +210,15 @@ The `gpt-4o-audio-preview` model introduces the audio modality into the existing
210
210
> [!NOTE]
211
211
> The [Realtime API](./realtime-audio-quickstart.md) uses the same underlying GPT-4o audio model as the completions API, but is optimized for low-latency, real-time audio interactions.
212
212
213
-
### GPT-4o Realtime API 2024-12-17
213
+
### GPT Realtime API 2024-12-17
214
214
215
215
The `gpt-4o-realtime-preview` model version 2024-12-17 is available for global deployments in [East US 2 and Sweden Central regions](./concepts/models.md#global-standard-model-availability). Use the `gpt-4o-realtime-preview` version 2024-12-17 model instead of the `gpt-4o-realtime-preview` version 2024-10-01-preview model for real-time audio interactions.
216
216
217
217
- Added support for [prompt caching](./how-to/prompt-caching.md) with the `gpt-4o-realtime-preview` model.
218
218
- Added support for new voices. The `gpt-4o-realtime-preview` models now support the following voices: `alloy`, `ash`, `ballad`, `coral`, `echo`, `sage`, `shimmer`, `verse`.
219
219
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for each `gpt-4o-realtime-preview` model deployment are 100 K TPM and 1 K RPM. During the preview, [Azure AI Foundry portal](https://ai.azure.com/?cid=learnDocs) and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit is 100 K TPM and 1 K RPM.
220
220
221
-
For more information, see the [GPT-4o real-time audio quickstart](realtime-audio-quickstart.md) and the [how-to guide](./how-to/realtime-audio.md).
221
+
For more information, see the [GPT real-time audio quickstart](realtime-audio-quickstart.md) and the [how-to guide](./how-to/realtime-audio.md).
0 commit comments