Skip to content

Commit e88b6f8

Browse files
committed
real-time how to
1 parent 2589dd4 commit e88b6f8

File tree

3 files changed

+25
-18
lines changed

3 files changed

+25
-18
lines changed

articles/ai-services/openai/how-to/realtime-audio.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -40,12 +40,17 @@ Before you can use GPT-4o real-time audio, you need:
4040

4141
- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
4242
- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
43-
- You need a deployment of the `gpt-4o-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section. You can deploy the model from the [Azure AI Studio model catalog](../../ai-studio/how-to/model-catalog-overview.md) or from your project in AI Studio.
43+
- You need a deployment of the `gpt-4o-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section. You can deploy the model from the [Azure AI Foundry portal model catalog](../../../ai-studio/how-to/model-catalog-overview.md) or from your project in AI Foundry portal.
4444

4545
For steps to deploy and use the `gpt-4o-realtime-preview` model, see [the real-time audio quickstart](../realtime-audio-quickstart.md).
4646

4747
For more information about the API and architecture, see the remaining sections in this guide.
4848

49+
## Sample code
50+
51+
Right now, the fastest way to get started development with the GPT-4o Realtime API is to download the sample code from the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
52+
53+
[The Azure-Samples/aisearch-openai-rag-audio repo](https://github.com/Azure-Samples/aisearch-openai-rag-audio) contains an example of how to implement RAG support in applications that use voice as their user interface, powered by the GPT-4o realtime API for audio.
4954

5055
## Architecture
5156

@@ -55,25 +60,25 @@ The Realtime API (via `/realtime`) is built on [the WebSockets API](https://deve
5560

5661
The Realtime API requires an existing Azure OpenAI resource endpoint in a supported region. The API is accessed via a secure WebSocket connection to the `/realtime` endpoint of your Azure OpenAI resource.
5762

58-
A full request URI can be constructed by concatenating:
63+
You can construct a full request URI by concatenating:
5964

6065
- The secure WebSocket (`wss://`) protocol
61-
- Your Azure OpenAI resource endpoint hostname, e.g. `my-aoai-resource.openai.azure.com`
66+
- Your Azure OpenAI resource endpoint hostname, for example, `my-aoai-resource.openai.azure.com`
6267
- The `openai/realtime` API path
63-
- An `api-version` query string parameter for a supported API version -- initially, `2024-10-01-preview`
68+
- An `api-version` query string parameter for a supported API version such as `2024-10-01-preview`
6469
- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` model deployment
6570

66-
Combining into a full example, the following could be a well-constructed `/realtime` request URI:
71+
The following example is a well-constructed `/realtime` request URI:
6772

6873
```http
6974
wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2024-10-01-preview&deployment=gpt-4o-realtime-preview-1001
7075
```
7176

7277
To authenticate:
73-
- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI Service resource that has managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
78+
- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI Service resource with managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
7479
- **API key**: An `api-key` can be provided in one of two ways:
75-
1. Using an `api-key` connection header on the pre-handshake connection. This option isn't available in a browser environment.
76-
2. Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
80+
- Using an `api-key` connection header on the prehandshake connection. This option isn't available in a browser environment.
81+
- Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
7782

7883

7984
### API concepts
@@ -89,11 +94,11 @@ To authenticate:
8994

9095
## API details
9196

92-
Once the WebSocket connection session to `/realtime` is established and authenticated, the functional interaction takes place via sending and receiving WebSocket messages, herein referred to as "commands" to avoid ambiguity with the content-bearing "message" concept already present for inference. These commands each take the form of a JSON object. Commands can be sent and received in parallel and applications should generally handle them both concurrently and asynchronously.
97+
Once the WebSocket connection session to `/realtime` is established and authenticated, the functional interaction takes place via sending and receiving WebSocket messages, that we refer to as "commands" to avoid ambiguity with the content-bearing "message" concept already present for inference. These commands each take the form of a JSON object. Commands can be sent and received in parallel and applications should generally handle them both concurrently and asynchronously.
9398

9499
### Session configuration and turn handling mode
95100

96-
Often, the first command sent by the caller on a newly established `/realtime` session will be a `session.update` payload. This command controls a wide set of input and output behavior, with output and response generation portions then later overridable via `update_conversation_config` or other properties in `response.create`.
101+
Often, the first command sent by the caller on a newly established `/realtime` session is a `session.update` payload. This command controls a wide set of input and output behavior, with output and response generation portions then later overridable via `update_conversation_config` or other properties in `response.create`.
97102

98103
One of the key session-wide settings is `turn_detection`, which controls how data flow is handled between the caller and model:
99104

@@ -104,6 +109,8 @@ Transcription of user input audio is opted into via the `input_audio_transcripti
104109

105110
## Summary of commands
106111

112+
Here's a summary of the commands that can be [sent](#requests) and [received](#responses) via the `/realtime` endpoint.
113+
107114
### Requests
108115

109116
The following table describes commands sent from the caller to the `/realtime` endpoint.
@@ -113,11 +120,11 @@ The following table describes commands sent from the caller to the `/realtime` e
113120
| **Session Configuration** | |
114121
| `session.update` | Configures the connection-wide behavior of the conversation session such as shared audio input handling and common response generation characteristics. This is typically sent immediately after connecting, but can also be sent at any point during a session to reconfigure behavior after the current response (if in progress) is complete. |
115122
| **Input Audio** | |
116-
| `input_audio_buffer_append` | Appends audio data to the shared user input buffer. This audio won't be processed until an end of speech is detected in the `server_vad` `turn_detection` mode or until a manual `response.create` is sent (in either `turn_detection` configuration). |
123+
| `input_audio_buffer_append` | Appends audio data to the shared user input buffer. This audio isn't processed until an end of speech is detected in the `server_vad` `turn_detection` mode or until a manual `response.create` is sent (in either `turn_detection` configuration). |
117124
| `input_audio_buffer_clear` | Clears the current audio input buffer. This doesn't affect responses already in progress. |
118125
| `input_audio_buffer_commit` | Commits the current state of the user input buffer to subscribed conversations, including it as information for the next response. |
119-
| **Item Management** | For establishing history or including non-audio item information. |
120-
| `item_create` | Inserts a new item into the conversation, optionally positioned according to `previous_item_id`. This property can provide new, non-audio input from the user (such as a text message), tool responses, or historical information from another interaction to form a conversation history before generation. |
126+
| **Item Management** | For establishing history or including nonaudio item information. |
127+
| `item_create` | Inserts a new item into the conversation, optionally positioned according to `previous_item_id`. This property can provide new, nonaudio input from the user (such as a text message), tool responses, or historical information from another interaction to form a conversation history before generation. |
121128
| `item_delete` | Removes an item from an existing conversation. |
122129
| `item_truncate` | Manually shortens text and audio content in a message. This property can be useful in situations where faster-than-realtime model generation produced more data that's later skipped by an interruption. |
123130
| **Response Management** |
@@ -143,7 +150,7 @@ The following table describes commands sent by the `/realtime` endpoint to the c
143150
| `response_cancelled` | Confirms that a response was canceled in response to a caller-initiated or internal signal. |
144151
| `rate_limits_updated` | This response is sent immediately after `response.done`, this property provides the current rate limit information reflecting updated status after the consumption of the just-finished response. |
145152
| **Item Flow in a Response** | |
146-
| `response_output_item_added` | Notifies that a new, server-generated conversation item *is being created*; content will then be populated via incremental `add_content` messages with a final `response_output_item_done` command signifying the item creation has completed. |
153+
| `response_output_item_added` | Notifies that a new, server-generated conversation item *is being created*; content is then be populated via incremental `add_content` messages with a final `response_output_item_done` command signifying the item creation completed. |
147154
| `response_output_item_done` | Notifies that a new conversation item is added to a conversation. For model-generated messages, this property is preceded by `response_output_item_added` and `delta` commands which begin and populate the new item, respectively. |
148155
| **Content Flow within Response Items** | |
149156
| `response_content_part_added` | Notifies that a new content part is being created within a conversation item in an ongoing response. Until `response_content_part_done` arrives, content is then incrementally provided via the appropriate `delta` commands. |
@@ -168,5 +175,5 @@ The following table describes commands sent by the `/realtime` endpoint to the c
168175

169176
## Related content
170177

171-
* Learn more about Azure OpenAI [deployment types](deployment-types.md)
172-
* Learn more about Azure OpenAI [quotas and limits](quotas-limits.md)
178+
* Try the [real-time audio quickstart](../realtime-audio-quickstart.md)
179+
* Learn more about Azure OpenAI [quotas and limits](../quotas-limits.md)

articles/ai-services/openai/realtime-audio-quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,5 +125,5 @@ You can run the sample code locally on your machine by following these steps. Re
125125
126126
## Related content
127127
128-
* Learn more about Azure OpenAI [deployment types](./how-to/deployment-types.md)
128+
* Learn more about [How to use the Realtime API](./how-to/realtime-audio.md)
129129
* Learn more about Azure OpenAI [quotas and limits](quotas-limits.md)

articles/ai-services/openai/toc.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -184,7 +184,7 @@ items:
184184
- name: Troubleshooting and best practices
185185
href: ./how-to/on-your-data-best-practices.md
186186
- name: Use the Realtime API (preview)
187-
href: realtime-audio.md
187+
href: ./how-to/realtime-audio.md
188188
- name: Migrate to OpenAI Python v1.x
189189
href: ./how-to/migration.md
190190
- name: Migrate to OpenAI JavaScript v4.x

0 commit comments

Comments
 (0)