MicrosoftDocs
diff --git a/‎articles/ai-services/openai/concepts/models.md
Lines changed: 1 addition & 1 deletion b/‎articles/ai-services/openai/concepts/models.md
Lines changed: 1 addition & 1 deletion
diff --git a/‎articles/ai-services/openai/how-to/realtime-audio-webrtc.md
Lines changed: 328 additions & 0 deletions b/‎articles/ai-services/openai/how-to/realtime-audio-webrtc.md
Lines changed: 328 additions & 0 deletions
diff --git a/‎articles/ai-services/openai/how-to/realtime-audio-websockets.md
Lines changed: 120 additions & 0 deletions b/‎articles/ai-services/openai/how-to/realtime-audio-websockets.md
Lines changed: 120 additions & 0 deletions
diff --git a/‎articles/ai-services/openai/how-to/realtime-audio.md
Lines changed: 11 additions & 75 deletions b/‎articles/ai-services/openai/how-to/realtime-audio.md
Lines changed: 11 additions & 75 deletions
@@ -258,7 +258,7 @@ Details about maximum request tokens and training data are available in the foll
 |`gpt-4o-mini-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
 |`gpt-4o-audio-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for audio and text generation. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
 |`gpt-4o-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
-|`gpt-4o-realtime-preview` (2024-10-01) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
+|`gpt-4o-mini-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000  <br> Output: 4,096 | Oct 2023 |
 
 To compare the availability of GPT-4o audio models across all regions, see the [models table](#global-standard-model-availability).
 
 
@@ -0,0 +1,120 @@
+---
+title: 'How to use the GPT-4o Realtime API via WebSockets (Preview)'
+titleSuffix: Azure OpenAI Service
+description: Learn how to use the GPT-4o Realtime API for speech and audio via WebSockets.
+manager: nitinme
+ms.service: azure-ai-openai
+ms.topic: how-to
+ms.date: 4/28/2025
+author: eric-urban
+ms.author: eur
+ms.custom: references_regions
+recommendations: false
+---
+
+# How to use the GPT-4o Realtime API via WebSockets (Preview)
+
+[!INCLUDE [Feature preview](../includes/preview-feature.md)]
+
+Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o model family that supports low-latency, "speech in, speech out" conversational interactions. 
+
+You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. Follow the instructions in this article to get started with the Realtime API via WebSockets.
+
+Use the Realtime API via WebSockets in server-to-server scenarios where low latency isn't a requirement.
+
+> [!TIP] 
+> In most cases, we recommend using the [Realtime API via WebRTC](./realtime-audio-webrtc.md) for real-time audio streaming in client-side applications such as a web application or mobile app. WebRTC is designed for low-latency, real-time audio streaming and is the best choice for most use cases.
+
+## Supported models
+
+The GPT-4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
+- `gpt-4o-mini-realtime-preview` (2024-12-17)
+- `gpt-4o-realtime-preview` (2024-12-17)
+
+You should use API version `2025-04-01-preview` in the URL for the Realtime API. 
+
+For more information about supported models, see the [models and versions documentation](../concepts/models.md#audio-models).
+
+## Prerequisites
+
+Before you can use GPT-4o real-time audio, you need:
+
+- An Azure subscription - <a href="https://azure.microsoft.com/free/cognitive-services" target="_blank">Create one for free</a>.
+- An Azure OpenAI resource created in a [supported region](#supported-models). For more information, see [Create a resource and deploy a model with Azure OpenAI](create-resource.md).
+- You need a deployment of the `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model in a supported region as described in the [supported models](#supported-models) section. You can deploy the model from the [Azure AI Foundry portal model catalog](../../../ai-foundry/how-to/model-catalog-overview.md) or from your project in Azure AI Foundry portal. 
+
+## Connection and authentication
+
+The Realtime API (via `/realtime`) is built on [the WebSockets API](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. 
+
+The Realtime API is accessed via a secure WebSocket connection to the `/realtime` endpoint of your Azure OpenAI resource.
+
+You can construct a full request URI by concatenating:
+
+- The secure WebSocket (`wss://`) protocol.
+- Your Azure OpenAI resource endpoint hostname, for example, `my-aoai-resource.openai.azure.com`
+- The `openai/realtime` API path.
+- An `api-version` query string parameter for a supported API version such as `2024-12-17`
+- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model deployment.
+
+The following example is a well-constructed `/realtime` request URI:
+
+```http
+wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2024-12-17&deployment=gpt-4o-mini-realtime-preview-deployment-name
+```
+
+To authenticate:
+- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI resource with managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
+- **API key**: An `api-key` can be provided in one of two ways:
+  - Using an `api-key` connection header on the prehandshake connection. This option isn't available in a browser environment.
+  - Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
+
+## Realtime API via WebSockets architecture
+
+Once the WebSocket connection session to `/realtime` is established and authenticated, the functional interaction takes place via events for sending and receiving WebSocket messages. These events each take the form of a JSON object. 
+
+:::image type="content" source="../media/how-to/real-time/realtime-api-sequence.png" alt-text="Diagram of the Realtime API authentication and connection sequence." lightbox="../media/how-to/real-time/realtime-api-sequence.png":::
+
+<!--
+sequenceDiagram
+  actor User as End User
+  participant MiddleTier as /realtime host
+  participant AOAI as Azure OpenAI
+  User->>MiddleTier: Begin interaction
+  MiddleTier->>MiddleTier: Authenticate/Validate User
+  MiddleTier--)User: audio information
+  User--)MiddleTier: 
+  MiddleTier--)User: text information
+  User--)MiddleTier: 
+  MiddleTier--)User: control information
+  User--)MiddleTier: 
+  MiddleTier->>AOAI: connect to /realtime
+  MiddleTier->>AOAI: configure session
+  AOAI->>MiddleTier: session start
+  MiddleTier--)AOAI: send/receive WS commands
+  AOAI--)MiddleTier: 
+  AOAI--)MiddleTier: create/start conversation responses
+  AOAI--)MiddleTier: (within responses) create/start/add/finish items
+  AOAI--)MiddleTier: (within items) create/stream/finish content parts
+-->
+
+Events can be sent and received in parallel and applications should generally handle them both concurrently and asynchronously.
+
+- A client-side caller establishes a connection to `/realtime`, which starts a new [`session`](../realtime-audio-reference.md#realtimerequestsession).
+- A `session` automatically creates a default `conversation`. Multiple concurrent conversations aren't supported.
+- The `conversation` accumulates input signals until a `response` is started, either via a direct event by the caller or automatically by voice activity detection (VAD).
+- Each `response` consists of one or more `items`, which can encapsulate messages, function calls, and other information.
+- Each message `item` has `content_part`, allowing multiple modalities (text and audio) to be represented across a single item.
+- The `session` manages configuration of caller input handling (for example, user audio) and common output generation handling.
+- Each caller-initiated [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) can override some of the output [`response`](../realtime-audio-reference.md#realtimeresponse) behavior, if desired.
+- Server-created `item` and the `content_part` in messages can be populated asynchronously and in parallel. For example, receiving audio, text, and function information concurrently in a round robin fashion.
+
+## Try the quickstart
+
+Now that you have the prerequisites, you can follow the instructions in the [Realtime API quickstart](../realtime-audio-quickstart.md) to get started with the Realtime API via WebSockets.
+
+## Related content
+
+* Try the [real-time audio quickstart](../realtime-audio-quickstart.md)
+* See the [Realtime API reference](../realtime-audio-reference.md)
+* Learn more about Azure OpenAI [quotas and limits](../quotas-limits.md)
@@ -1,11 +1,11 @@
 ---
-title: 'How to use the GPT-4o Realtime API for speech and audio with Azure OpenAI Service'
-titleSuffix: Azure OpenAI
-description: Learn how to use the GPT-4o Realtime API for speech and audio with Azure OpenAI Service.
+title: 'How to use the GPT-4o Realtime API for speech and audio with Azure OpenAI'
+titleSuffix: Azure OpenAI Service
+description: Learn how to use the GPT-4o Realtime API for speech and audio with Azure OpenAI.
 manager: nitinme
 ms.service: azure-ai-openai
 ms.topic: how-to
-ms.date: 3/20/2025
+ms.date: 4/28/2025
 author: eric-urban
 ms.author: eur
 ms.custom: references_regions
@@ -20,12 +20,17 @@ Azure OpenAI GPT-4o Realtime API for speech and audio is part of the GPT-4o mode
 
 Most users of the Realtime API need to deliver and receive audio from an end-user in real time, including applications that use WebRTC or a telephony system. The Realtime API isn't designed to connect directly to end user devices and relies on client integrations to terminate end user audio streams. 
 
+You can use the Realtime API via WebRTC or WebSocket to send audio input to the model and receive audio responses in real time. In most cases, we recommend using the WebRTC API for low-latency real-time audio streaming. For more information, see:
+- [Realtime API via WebRTC](./realtime-audio-webrtc.md)
+- [Realtime API via WebSockets](./realtime-audio-websockets.md)
+
 ## Supported models
 
 The GPT 4o real-time models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
 - `gpt-4o-mini-realtime-preview` (2024-12-17)
 - `gpt-4o-realtime-preview` (2024-12-17)
-- `gpt-4o-realtime-preview` (2024-10-01)
+
+You should use API version `2025-04-01-preview` in the URL for the Realtime API. 
 
 See the [models and versions documentation](../concepts/models.md#audio-models) for more information.
 
@@ -39,78 +44,9 @@ Before you can use GPT-4o real-time audio, you need:
 
 Here are some of the ways you can get started with the GPT-4o Realtime API for speech and audio:
 - For steps to deploy and use the `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model, see [the real-time audio quickstart](../realtime-audio-quickstart.md).
-- Download the sample code from the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
+- Try the [WebRTC via HTML and JavaScript example](./realtime-audio-webrtc.md#webrtc-example-via-html-and-javascript) to get started with the Realtime API via WebRTC.
 - [The Azure-Samples/aisearch-openai-rag-audio repo](https://github.com/Azure-Samples/aisearch-openai-rag-audio) contains an example of how to implement RAG support in applications that use voice as their user interface, powered by the GPT-4o realtime API for audio.
 
-## Connection and authentication
-
-The Realtime API (via `/realtime`) is built on [the WebSockets API](https://developer.mozilla.org/en-US/docs/Web/API/WebSockets_API) to facilitate fully asynchronous streaming communication between the end user and model. 
-
-> [!IMPORTANT]
-> Device details like capturing and rendering audio data are outside the scope of the Realtime API. It should be used in the context of a trusted, intermediate service that manages both connections to end users and model endpoint connections. Don't use it directly from untrusted end user devices.
-
-The Realtime API is accessed via a secure WebSocket connection to the `/realtime` endpoint of your Azure OpenAI resource.
-
-You can construct a full request URI by concatenating:
-
-- The secure WebSocket (`wss://`) protocol.
-- Your Azure OpenAI resource endpoint hostname, for example, `my-aoai-resource.openai.azure.com`
-- The `openai/realtime` API path.
-- An `api-version` query string parameter for a supported API version such as `2024-12-17`
-- A `deployment` query string parameter with the name of your `gpt-4o-realtime-preview` or `gpt-4o-mini-realtime-preview` model deployment.
-
-The following example is a well-constructed `/realtime` request URI:
-
-```http
-wss://my-eastus2-openai-resource.openai.azure.com/openai/realtime?api-version=2024-12-17&deployment=gpt-4o-mini-realtime-preview-deployment-name
-```
-
-To authenticate:
-- **Microsoft Entra** (recommended): Use token-based authentication with the `/realtime` API for an Azure OpenAI Service resource with managed identity enabled. Apply a retrieved authentication token using a `Bearer` token with the `Authorization` header.
-- **API key**: An `api-key` can be provided in one of two ways:
-  - Using an `api-key` connection header on the prehandshake connection. This option isn't available in a browser environment.
-  - Using an `api-key` query string parameter on the request URI. Query string parameters are encrypted when using https/wss.
-
-## Realtime API architecture
-
-Once the WebSocket connection session to `/realtime` is established and authenticated, the functional interaction takes place via events for sending and receiving WebSocket messages. These events each take the form of a JSON object. 
-
-:::image type="content" source="../media/how-to/real-time/realtime-api-sequence.png" alt-text="Diagram of the Realtime API authentication and connection sequence." lightbox="../media/how-to/real-time/realtime-api-sequence.png":::
-
-<!--
-sequenceDiagram
-  actor User as End User
-  participant MiddleTier as /realtime host
-  participant AOAI as Azure OpenAI
-  User->>MiddleTier: Begin interaction
-  MiddleTier->>MiddleTier: Authenticate/Validate User
-  MiddleTier--)User: audio information
-  User--)MiddleTier: 
-  MiddleTier--)User: text information
-  User--)MiddleTier: 
-  MiddleTier--)User: control information
-  User--)MiddleTier: 
-  MiddleTier->>AOAI: connect to /realtime
-  MiddleTier->>AOAI: configure session
-  AOAI->>MiddleTier: session start
-  MiddleTier--)AOAI: send/receive WS commands
-  AOAI--)MiddleTier: 
-  AOAI--)MiddleTier: create/start conversation responses
-  AOAI--)MiddleTier: (within responses) create/start/add/finish items
-  AOAI--)MiddleTier: (within items) create/stream/finish content parts
--->
-
-Events can be sent and received in parallel and applications should generally handle them both concurrently and asynchronously.
-
-- A client-side caller establishes a connection to `/realtime`, which starts a new [`session`](#session-configuration).
-- A `session` automatically creates a default `conversation`. Multiple concurrent conversations aren't supported.
-- The `conversation` accumulates input signals until a `response` is started, either via a direct event by the caller or automatically by voice activity detection (VAD).
-- Each `response` consists of one or more `items`, which can encapsulate messages, function calls, and other information.
-- Each message `item` has `content_part`, allowing multiple modalities (text and audio) to be represented across a single item.
-- The `session` manages configuration of caller input handling (for example, user audio) and common output generation handling.
-- Each caller-initiated [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) can override some of the output [`response`](../realtime-audio-reference.md#realtimeresponse) behavior, if desired.
-- Server-created `item` and the `content_part` in messages can be populated asynchronously and in parallel. For example, receiving audio, text, and function information concurrently in a round robin fashion.
-
 ## Session configuration
 
 Often, the first event sent by the caller on a newly established `/realtime` session is a [`session.update`](../realtime-audio-reference.md#realtimeclienteventsessionupdate) payload. This event controls a wide set of input and output behavior, with output and response generation properties then later overridable using the [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) event.