You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/realtime-audio.md
+19-17Lines changed: 19 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,15 +48,21 @@ For steps to deploy and use the `gpt-4o-realtime-preview` model, see [the real-t
48
48
For more information about the API and architecture, see the remaining sections in this guide.
49
49
50
50
51
-
## Connection and authentication with the Realtime API
51
+
## Architecture
52
+
53
+
The Realtime API (via `/realtime`) is built on [the WebSockets API](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. Device details like capturing and rendering audio data are outside the scope of the Realtime API. It should be used in the context of a trusted, intermediate service that manages both connections to end users and model endpoint connections. Don't use it directly from untrusted end user devices.
54
+
55
+
### Connection and authentication with the Realtime API
56
+
57
+
The Realtime API requires an existing Azure OpenAI resource endpoint in a supported region. The API is accessed via a secure WebSocket connection to the `/realtime` endpoint of your Azure OpenAI resource.
52
58
53
-
The `/realtime` API requires an existing Azure OpenAI resource endpoint in a supported region. A full request URI can be constructed by concatenating:
59
+
A full request URI can be constructed by concatenating:
54
60
55
-
1. The secure WebSocket (`wss://`) protocol
56
-
2. Your Azure OpenAI resource endpoint hostname, e.g. `my-aoai-resource.openai.azure.com`
57
-
3. The `openai/realtime` API path
58
-
4. An `api-version` query string parameter for a supported API version -- initially, `2024-10-01-preview`
59
-
5. A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` model deployment
61
+
- The secure WebSocket (`wss://`) protocol
62
+
- Your Azure OpenAI resource endpoint hostname, e.g. `my-aoai-resource.openai.azure.com`
63
+
- The `openai/realtime` API path
64
+
- An `api-version` query string parameter for a supported API version -- initially, `2024-10-01-preview`
65
+
- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` model deployment
60
66
61
67
Combining into a full example, the following could be a well-constructed `/realtime` request URI:
-**Using Microsoft Entra**: `/realtime` supports token-based authentication with against an appropriately configured Azure OpenAI Service resource that has managed identity enabled. Use a `Bearer` token with the `Authorization` header to apply a retrieved authentication token.
69
-
-**Using an API key**: An `api-key` can be provided in one of two ways:
70
-
1. Using an `api-key` connection header on the pre-handshake connection (note: not available in a browser environment)
71
-
2. Using an `api-key` query string parameter on the request URI (note: query string parameters are encrypted when using https/wss)
72
-
73
-
74
-
## Architecture
74
+
-**Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI Service resource that has managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
75
+
-**API key**: An `api-key` can be provided in one of two ways:
76
+
1. Using an `api-key` connection header on the pre-handshake connection. This option isn't available in a browser environment.
77
+
2. Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
75
78
76
-
The Realtime API (via`/realtime`) is built on [the WebSockets API](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. It should be used in the context of a trusted, intermediate service that manages both connections to end users and model endpoint connections. Don't use it directly from untrusted end user devices, and device details like capturing and rendering audio data are outside the scope of the Realtime API.
77
79
78
-
## API concepts
80
+
###API concepts
79
81
80
82
- A caller establishes a connection to `/realtime`, which starts a new `session`.
81
83
- A `session` automatically creates a default `conversation`. Multiple concurrent conversations aren't supported.
@@ -110,7 +112,7 @@ The following table describes commands sent from the caller to the `/realtime` e
110
112
|`type`| Description |
111
113
|---|---|
112
114
|**Session Configuration**||
113
-
|`session.update`| Configures the connection-wide behavior of the conversation session such as shared audio input handling and common response generation characteristics. This is typically sent immediately after connecting but can also be sent at any point during a session to reconfigure behavior after the current response (if in progress) is complete. |
115
+
|`session.update`| Configures the connection-wide behavior of the conversation session such as shared audio input handling and common response generation characteristics. This is typically sent immediately after connecting, but can also be sent at any point during a session to reconfigure behavior after the current response (if in progress) is complete. |
114
116
|**Input Audio**||
115
117
|`input_audio_buffer_append`| Appends audio data to the shared user input buffer. This audio won't be processed until an end of speech is detected in the `server_vad``turn_detection` mode or until a manual `response.create` is sent (in either `turn_detection` configuration). |
116
118
|`input_audio_buffer_clear`| Clears the current audio input buffer. This doesn't affect responses already in progress. |
0 commit comments