Skip to content

Commit 35b42f4

Browse files
authored
Merge pull request #5428 from MicrosoftDocs/main
Publish to live, Sunday 4 AM PST, 6/8
2 parents bc28a6c + a766ebc commit 35b42f4

File tree

4 files changed

+22
-19
lines changed

4 files changed

+22
-19
lines changed

articles/ai-services/openai/how-to/realtime-audio-webrtc.md

Lines changed: 9 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use the GPT-4o Realtime API for speech and audio via W
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: how-to
8-
ms.date: 4/28/2025
8+
ms.date: 6/7/2025
99
author: eric-urban
1010
ms.author: eur
1111
ms.custom: references_regions
@@ -44,16 +44,16 @@ Before you can use GPT-4o real-time audio, you need:
4444

4545
- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
4646
- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
47-
- You need a deployment of the `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section. You can deploy the model from the [Azure AI Foundry model catalog](../../../ai-foundry/how-to/model-catalog-overview.md) or from your project in Azure AI Foundry portal.
47+
- You need a deployment of the `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section in this article. You can deploy the model from the [Azure AI Foundry model catalog](../../../ai-foundry/how-to/model-catalog-overview.md) or from your project in Azure AI Foundry portal.
4848

4949
## Connection and authentication
5050

5151
You use different URLs to get an ephemeral API key and connect to the Realtime API via WebRTC. The URLs are constructed as follows:
5252

5353
| URL | Description |
5454
|---|---|
55-
| Sessions URL | The `/realtime/sessions` URL is used to get an ephemeral API key. The sessions URL includes the Azure OpenAI resource URL, deployment name, the `/realtime/sessions` path, and the API version.<br/><br/>You should use API version `2025-04-01-preview` in the URL.<br/><br/>For an example and more information, see the [Sessions URL](#sessions-url) section below.|
56-
| WebRTC URL | The WebRTC URL is used to establish a WebRTC peer connection with the Realtime API. The WebRTC URL includes the region and the `realtimeapi-preview.ai.azure.com/v1/realtimertc` path.<br/><br/>The supported regions are `eastus2` and `swedencentral`.<br/><br/>For an example and more information, see the [Sessions URL](#webrtc-url) section below.|
55+
| Sessions URL | The `/realtime/sessions` URL is used to get an ephemeral API key. The sessions URL includes the Azure OpenAI resource URL, deployment name, the `/realtime/sessions` path, and the API version.<br/><br/>You should use API version `2025-04-01-preview` in the URL.<br/><br/>For an example and more information, see the [Sessions URL](#sessions-url) section in this article.|
56+
| WebRTC URL | The WebRTC URL is used to establish a WebRTC peer connection with the Realtime API. The WebRTC URL includes the region and the `realtimeapi-preview.ai.azure.com/v1/realtimertc` path.<br/><br/>The supported regions are `eastus2` and `swedencentral`.<br/><br/>For an example and more information, see the [Sessions URL](#webrtc-url) section in this article.|
5757

5858
### Sessions URL
5959
Here's an example of a well-constructed `realtime/sessions` URL that you use to get an ephemeral API key:
@@ -156,7 +156,7 @@ The sample code is an HTML page that allows you to start a session with the GPT-
156156
157157
// The deployment name might not be the same as the model name.
158158
const DEPLOYMENT = "gpt-4o-mini-realtime-preview"
159-
const VOICE = "verse"
159+
const VOICE = "verse"
160160
161161
async function StartSession() {
162162
try {
@@ -170,8 +170,6 @@ The sample code is an HTML page that allows you to start a session with the GPT-
170170
const response = await fetch(SESSIONS_URL, {
171171
method: "POST",
172172
headers: {
173-
// The Authorization header is commented out because
174-
// currently it isn't supported with the sessions API.
175173
//"Authorization": `Bearer ${ACCESS_TOKEN}`,
176174
"api-key": API_KEY,
177175
"Content-Type": "application/json"
@@ -188,13 +186,13 @@ The sample code is an HTML page that allows you to start a session with the GPT-
188186
189187
const data = await response.json();
190188
191-
const sessionId = data.id;
192-
const ephemeralKey = data.client_secret?.value;
193-
console.error("Ephemeral key:", ephemeralKey);
189+
const sessionId = data.id;
190+
const ephemeralKey = data.client_secret?.value;
191+
console.error("Ephemeral key:", ephemeralKey);
194192
195193
// Mask the ephemeral key in the log message.
196194
logMessage("Ephemeral Key Received: " + "***");
197-
logMessage("WebRTC Session Id = " + sessionId );
195+
logMessage("WebRTC Session Id = " + sessionId );
198196
199197
// Set up the WebRTC connection using the ephemeral key.
200198
init(ephemeralKey);

articles/ai-services/openai/how-to/realtime-audio-websockets.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use the GPT-4o Realtime API for speech and audio via W
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: how-to
8-
ms.date: 4/28/2025
8+
ms.date: 6/7/2025
99
author: eric-urban
1010
ms.author: eur
1111
ms.custom: references_regions
@@ -18,9 +18,9 @@ recommendations: false
1818

1919
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
2020

21-
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. Follow the instructions in this article to get started with the Realtime API via WebSockets.
21+
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time.
2222

23-
Use the Realtime API via WebSockets in server-to-server scenarios where low latency isn't a requirement.
23+
Follow the instructions in this article to get started with the Realtime API via WebSockets. Use the Realtime API via WebSockets in server-to-server scenarios where low latency isn't a requirement.
2424

2525
> [!TIP]
2626
> In most cases, we recommend using the [Realtime API via WebRTC](./realtime-audio-webrtc.md) for real-time audio streaming in client-side applications such as a web application or mobile app. WebRTC is designed for low-latency, real-time audio streaming and is the best choice for most use cases.

articles/ai-services/openai/realtime-audio-quickstart.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ description: Learn how to use GPT-4o Realtime API for speech and audio with Azur
55
manager: nitinme
66
ms.service: azure-ai-openai
77
ms.topic: how-to
8-
ms.date: 5/23/2025
8+
ms.date: 6/7/2025
99
author: eric-urban
1010
ms.author: eur
1111
ms.custom: references_regions, ignite-2024
@@ -17,9 +17,14 @@ recommendations: false
1717

1818
[!INCLUDE [Feature preview](includes/preview-feature.md)]
1919

20-
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. The GPT-4o audio `realtime` API is designed to handle real-time, low-latency conversational interactions, making it a great fit for use cases involving live interactions between a user and a model, such as customer support agents, voice assistants, and real-time translators.
20+
Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions.
2121

22-
Most users of the Realtime API need to deliver and receive audio from an end-user in real time, including applications that use WebRTC or a telephony system. The Realtime API isn't designed to connect directly to end user devices and relies on client integrations to terminate end user audio streams.
22+
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time.
23+
24+
Follow the instructions in this article to get started with the Realtime API via WebSockets. Use the Realtime API via WebSockets in server-to-server scenarios where low latency isn't a requirement.
25+
26+
> [!TIP]
27+
> In most cases, we recommend using the [Realtime API via WebRTC](./how-to/realtime-audio-webrtc.md) for real-time audio streaming in client-side applications such as a web application or mobile app. WebRTC is designed for low-latency, real-time audio streaming and is the best choice for most use cases.
2328
2429
## Supported models
2530

articles/ai-services/openai/realtime-audio-reference.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1390,7 +1390,7 @@ You use the `RealtimeRequestSession` object when you want to update the session
13901390
| voice | [RealtimeVoice](#realtimevoice) | The voice used for the model response for the session.<br><br>Once the voice is used in the session for the model's audio response, it can't be changed. |
13911391
| input_audio_format | [RealtimeAudioFormat](#realtimeaudioformat) | The format for the input audio. |
13921392
| output_audio_format | [RealtimeAudioFormat](#realtimeaudioformat) | The format for the output audio. |
1393-
| input_audio_noise_reduction | boolean | Configuration for input audio noise reduction. This can be set to null to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.<br><br>This property is nullable.|
1393+
| input_audio_noise_reduction | [RealtimeAudioInputAudioNoiseReductionSettings](#realtimeaudioinputaudionoisereductionsettings) | Configuration for input audio noise reduction. This can be set to null to turn off. Noise reduction filters audio added to the input audio buffer before it is sent to VAD and the model. Filtering the audio can improve VAD and turn detection accuracy (reducing false positives) and model performance by improving perception of the input audio.<br><br>This property is nullable.|
13941394
| input_audio_transcription | [RealtimeAudioInputTranscriptionSettings](#realtimeaudioinputtranscriptionsettings) | The configuration for input audio transcription. The configuration is null (off) by default. Input audio transcription isn't native to the model, since the model consumes audio directly. Transcription runs asynchronously through the `/audio/transcriptions` endpoint and should be treated as guidance of input audio content rather than precisely what the model heard. For additional guidance to the transcription service, the client can optionally set the language and prompt for transcription.<br><br>This property is nullable. |
13951395
| turn_detection | [RealtimeTurnDetection](#realtimeturndetection) | The turn detection settings for the session.<br><br>This property is nullable. |
13961396
| tools | array of [RealtimeTool](#realtimetool) | The tools available to the model for the session. |
@@ -1662,7 +1662,7 @@ Currently, only 'function' tools are supported.
16621662
| silence_duration_ms | string | The duration of silence (in milliseconds) to detect the end of speech. You want to detect the end of speech as soon as possible, but not too soon to avoid cutting off the last part of the speech.<br><br>The model will respond more quickly if you set this value to a lower number, but it might cut off the last part of the speech. If you set this value to a higher number, the model will wait longer to detect the end of speech, but it might take longer to respond.<br><br>Defaults to `500` milliseconds.<br/><br>This property is only applicable for `server_vad` turn detection. |
16631663
| create_response | boolean | Indicates whether the server will automatically create a response when VAD is enabled and speech stops.<br><br>Defaults to `true`. |
16641664
| interrupt_response | boolean | Indicates whether the server will automatically interrupt any ongoing response with output to the default (`auto`) conversation when a VAD start event occurs.<br><br>Defaults to `true`. |
1665-
| eagerness | boolean | The eagerness of the model to respond and interrupt the user. Specify `low` to wait longer for the user to continue speaking. Specify `high` to chunk the audio as soon as possible for quicker responses. The default value is `auto` that's equivalent to medium.<br/><br>This property is only applicable for `server_vad` turn detection. |
1665+
| eagerness | string | The eagerness of the model to respond and interrupt the user. Specify `low` to wait longer for the user to continue speaking. Specify `high` to chunk the audio as soon as possible for quicker responses. The default value is `auto` that's equivalent to medium.<br/><br>This property is only applicable for `server_vad` turn detection. |
16661666

16671667
### RealtimeTurnDetectionType
16681668

0 commit comments

Comments
 (0)