You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/realtime-audio-webrtc.md
+9-8Lines changed: 9 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -21,13 +21,12 @@ Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o mode
21
21
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. Follow the instructions in this article to get started with the Realtime API via WebRTC.
22
22
23
23
In most cases, we recommend using the WebRTC API for real-time audio streaming. The WebRTC API is a web standard that enables real-time communication (RTC) between browsers and mobile applications. Here are some reasons why WebRTC is preferred for real-time audio streaming:
24
-
-**Lower Latency**: WebRTC is specifically designed to minimize delay, making it more suitable for audio and video communication where low latency is critical for maintaining quality and synchronization.
24
+
-**Lower Latency**: WebRTC is designed to minimize delay, making it more suitable for audio and video communication where low latency is critical for maintaining quality and synchronization.
25
25
-**Media Handling**: WebRTC has built-in support for audio and video codecs, providing optimized handling of media streams.
26
26
-**Error Correction**: WebRTC includes mechanisms for handling packet loss and jitter, which are essential for maintaining the quality of audio streams over unpredictable networks.
27
27
-**Peer-to-Peer Communication**: WebRTC allows direct communication between clients, reducing the need for a central server to relay audio data, which can further reduce latency.
28
-
-**Network Traversal**: WebRTC includes built-in support for NAT traversal, which helps establish connections between clients behind firewalls or NATs.
29
28
30
-
Use the [Realtime API via WebSockets](./realtime-audio-websockets.md) if you need to stream audio data from a server to a client, or if you need to send and receive data in real time between a client and server. WebSockets are not recommended for real-time audio streaming because they have higher latency than WebRTC.
29
+
Use the [Realtime API via WebSockets](./realtime-audio-websockets.md) if you need to stream audio data from a server to a client, or if you need to send and receive data in real time between a client and server. WebSockets aren't recommended for real-time audio streaming because they have higher latency than WebRTC.
31
30
32
31
## Supported models
33
32
@@ -36,7 +35,7 @@ The GPT 4o real-time models are available for global deployments in [East US 2 a
36
35
-`gpt-4o-realtime-preview` (2024-12-17)
37
36
-`gpt-4o-realtime-preview` (2024-10-01)
38
37
39
-
See the [models and versions documentation](../concepts/models.md#audio-models) for more information.
38
+
For more information about supported models, see the [models and versions documentation](../concepts/models.md#audio-models).
40
39
41
40
## Prerequisites
42
41
@@ -74,7 +73,7 @@ The sessions URL includes the Azure OpenAI resource URL, deployment name, the `/
74
73
75
74
You can use the ephemeral API key to authenticate a WebRTC session with the Realtime API. The ephemeral key is valid for one minute and is used to establish a secure WebRTC connection between the client and the Realtime API.
76
75
77
-
The sequence diagram below illustrates the process of minting an ephemeral API key and using it to authenticate a WebRTC session with the Realtime API. The sequence diagram shows the following steps:
76
+
Here's how the ephemeral API key is used in the Realtime API:
78
77
79
78
1. Your client requests an ephemeral API key from your server.
80
79
1. Your server mints the ephemeral API key using the standard API key.
@@ -84,7 +83,9 @@ The sequence diagram below illustrates the process of minting an ephemeral API k
84
83
85
84
1. Your server returns the ephemeral API key to your client.
86
85
1. Your client uses the ephemeral API key to authenticate a session with the Realtime API via WebRTC.
87
-
1. Send and receive audio data in real time using the WebRTC peer connection.
86
+
1. You send and receive audio data in real time using the WebRTC peer connection.
87
+
88
+
The following sequence diagram illustrates the process of minting an ephemeral API key and using it to authenticate a WebRTC session with the Realtime API.
88
89
89
90
:::image type="content" source="../media/how-to/real-time/ephemeral-key-webrtc.png" alt-text="Diagram of the ephemeral API key to WebRTC peer connection sequence." lightbox="../media/how-to/real-time/ephemeral-key-webrtc.png":::
90
91
@@ -109,7 +110,7 @@ The following code sample demonstrates how to use the GPT-4o Realtime API via We
109
110
The sample code is an HTML page that allows you to start a session with the GPT-4o Realtime API and send audio input to the model. The model's responses are played back in real-time.
110
111
111
112
> [!WARNING]
112
-
> The sample code includes the API key hardcoded in the JavaScript. This is not recommended for production use. In a production environment, you should use a secure backend service to generate an ephemeral key and return it to the client.
113
+
> The sample code includes the API key hardcoded in the JavaScript. This code isn't recommended for production use. In a production environment, you should use a secure backend service to generate an ephemeral key and return it to the client.
113
114
114
115
1. Copy the following code into an HTML file and open it in a web browser:
115
116
@@ -294,7 +295,7 @@ The sample code is an HTML page that allows you to start a session with the GPT-
294
295
295
296
1. Select **Start Session** to start a session with the GPT-4o Realtime API. The session ID and ephemeral key are displayed in the log container.
296
297
1. Allow the browser to access your microphone when prompted.
297
-
1. Confirmation messages are displayed in the log container as the session progresses. The following is an example of the log messages:
298
+
1. Confirmation messages are displayed in the log container as the session progresses. Here's an example of the log messages:
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/realtime-audio-websockets.md
+5-4Lines changed: 5 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,18 +20,19 @@ Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o mode
20
20
21
21
You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. Follow the instructions in this article to get started with the Realtime API via WebSockets.
22
22
23
-
In most cases, we recommend using the [Realtime API via WebRTC](./realtime-audio-webrtc.md) for real-time audio streaming. The WebRTC API is a web standard that enables real-time communication (RTC) between browsers and mobile applications.
23
+
Use the Realtime API via WebSockets in server-to-server scenarios where low latency isn't a requirement.
24
24
25
-
WebSockets are not recommended for real-time audio streaming because they have higher latency than WebRTC. Use the Realtime API via WebSockets if you need to stream audio data from a server to a client, or if you need to send and receive data in real time between a client and server.
25
+
> [!TIP]
26
+
> In most cases, we recommend using the [Realtime API via WebRTC](./realtime-audio-webrtc.md) for real-time audio streaming in client-side applications such as a web application or mobile app. WebRTC is designed for low-latency, real-time audio streaming and is the best choice for most use cases.
26
27
27
28
## Supported models
28
29
29
-
The GPT4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
30
+
The GPT-4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
30
31
-`gpt-4o-mini-realtime-preview` (2024-12-17)
31
32
-`gpt-4o-realtime-preview` (2024-12-17)
32
33
-`gpt-4o-realtime-preview` (2024-10-01)
33
34
34
-
See the [models and versions documentation](../concepts/models.md#audio-models) for more information.
35
+
For more information about supported models, see the [models and versions documentation](../concepts/models.md#audio-models).
0 commit comments