You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/audio-real-time.md
+29-5Lines changed: 29 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,10 +5,11 @@ description: Learn how to use GPT-4o Realtime API for speech and audio with Azur
5
5
manager: nitinme
6
6
ms.service: azure-ai-openai
7
7
ms.topic: how-to
8
-
ms.date: 10/1/2024
8
+
ms.date: 10/3/2024
9
9
author: eric-urban
10
10
ms.author: eur
11
11
ms.custom: references_regions
12
+
zone_pivot_groups: openai-studio-js
12
13
recommendations: false
13
14
---
14
15
@@ -54,14 +55,35 @@ You can deploy the model from the [Azure AI Studio model catalog](../../../ai-st
54
55
1. Modify other default settings depending on your requirements.
55
56
1. Select **Deploy**. You land on the deployment details page.
56
57
57
-
Now that you have a deployment of the `gpt-4o-realtime-preview` model, you can use the Realtime API to interact with it in real time.
58
+
Now that you have a deployment of the `gpt-4o-realtime-preview` model, you can use the AI Studio **Real-time audio** playground or Realtime API to interact with it in real time.
58
59
59
-
## Use the GPT-4o Realtime API
60
+
## Use the GPT-4o real-time audio
60
61
61
62
> [!TIP]
62
-
> A playground for GPT-4o real-time audio is coming soon to [Azure AI Studio](https://ai.azure.com). You can already use the API directly in your application.
63
+
> Right now, the fastest way to get started development with the GPT-4o Realtime API is to download the sample code from the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
63
64
64
-
Right now, the fastest way to get started with the GPT-4o Realtime API is to download the sample code from the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
65
+
::: zone pivot="programming-language-ai-studio"
66
+
67
+
To chat with your deployed `gpt-4o-realtime-preview` model in the [Azure AI Studio](https://ai.azure.com)**Real-time audio** playground, follow these steps:
68
+
69
+
1. Go to your project in [Azure AI Studio](https://ai.azure.com).
70
+
1. Select **Playgrounds** > **Real-time audio** from the left pane.
71
+
1. Select your deployed `gpt-4o-realtime-preview` model from the **Deployment** dropdown.
72
+
1. Select **Enable microphone** to allow the browser to access your microphone. If you already granted permission, you can skip this step.
73
+
74
+
:::image type="content" source="../media/how-to/real-time/real-time-playground.png" alt-text="Screenshot of the real-time audio playground with the deployed model selected." lightbox="../media/how-to/real-time/real-time-playground.png":::
75
+
76
+
1. Optionally you can edit contents in the **Give the model instructions and context** text box. Give the model instructions about how it should behave and any context it should reference when generating a response. You can describe the assistant's personality, tell it what it should and shouldn't answer, and tell it how to format responses.
77
+
1. Optionally, change settings such as threshold, prefix padding, and silence duration.
78
+
1. Select **Start listening** to start the session. You can speak into the microphone to start a chat.
79
+
80
+
:::image type="content" source="../media/how-to/real-time/real-time-playground-start-listening.png" alt-text="Screenshot of the real-time audio playground with the start listening button and microphone access enabled." lightbox="../media/how-to/real-time/real-time-playground-start-listening.png":::
81
+
82
+
1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop listening** button.
83
+
84
+
::: zone-end
85
+
86
+
::: zone pivot="programming-language-javascript"
65
87
66
88
The JavaScript web sample demonstrates how to use the GPT-4o Realtime API to interact with the model in real time. The sample code includes a simple web interface that captures audio from the user's microphone and sends it to the model for processing. The model responds with text and audio, which the sample code renders in the web interface.
67
89
@@ -103,6 +125,8 @@ You can run the sample code locally on your machine by following these steps. Re
103
125
1. You should see a `<< Session Started >>` message in the main output. Then you can speak into the microphone to start a chat.
104
126
1. You can interrupt the chat at any time by speaking. You can end the chat by selecting the **Stop** button.
105
127
128
+
::: zone-end
129
+
106
130
## Related content
107
131
108
132
* Learn more about Azure OpenAI [deployment types](./deployment-types.md)
Copy file name to clipboardExpand all lines: articles/ai-services/openai/includes/api-surface.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,8 +22,8 @@ Each API surface/specification encapsulates a different set of Azure OpenAI capa
22
22
| API | Latest preview release | Latest GA release | Specifications | Description |
23
23
|:---|:----|:----|:----|:---|
24
24
| **Control plane** | [`2024-06-01-preview`](/rest/api/aiservices/accountmanagement/operation-groups?view=rest-aiservices-accountmanagement-2024-06-01-preview&preserve-view=true) | [`2023-05-01`](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/resource-manager/Microsoft.CognitiveServices) | Azure OpenAI shares a common control plane with all other Azure AI Services. The control plane API is used for things like [creating Azure OpenAI resources](/rest/api/aiservices/accountmanagement/accounts/create?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), [model deployment](/rest/api/aiservices/accountmanagement/deployments/create-or-update?view=rest-aiservices-accountmanagement-2023-05-01&tabs=HTTP&preserve-view=true), and other higher level resource management tasks. The control plane also governs what is possible to do with capabilities like Azure Resource Manager, Bicep, Terraform, and Azure CLI.|
25
-
| **Data plane - authoring** | `2024-08-01-preview` | [`2024-06-01`](/rest/api/azureopenai/operation-groups?view=rest-azureopenai-2024-06-01&preserve-view=true) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true)
26
-
|**Data plane - inference**|[`2024-08-01-preview`](/azure/ai-services/openai/reference-preview#data-plane-inference)|[`2024-06-01`](/azure/ai-services/openai/reference#data-plane-inference)|[Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference)| The data plane inference API provides the inference capabilities/endpoints for features like completions, chat completions, embeddings, speech/whisper, on your data, Dall-e, assistants, etc. |
25
+
| **Data plane - authoring** | [`2024-08-01-preview`](/rest/api/azureopenai/operation-groups?view=rest-azureopenai-2024-08-01-preview&preserve-view=true) | [`2024-06-01`](/rest/api/azureopenai/operation-groups?view=rest-azureopenai-2024-06-01&preserve-view=true) | [Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/authoring) | The data plane authoring API controls [fine-tuning](/rest/api/azureopenai/fine-tuning?view=rest-azureopenai-2024-08-01-preview&preserve-view=true), [file-upload](/rest/api/azureopenai/files/upload?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [ingestion jobs](/rest/api/azureopenai/ingestion-jobs/create?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true), [batch](/rest/api/azureopenai/batch?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true) and certain [model level queries](/rest/api/azureopenai/models/get?view=rest-azureopenai-2024-08-01-preview&tabs=HTTP&preserve-view=true)
26
+
|**Data plane - inference**|[`2024-09-01-preview`](/azure/ai-services/openai/reference-preview#data-plane-inference)|[`2024-06-01`](/azure/ai-services/openai/reference#data-plane-inference)|[Spec files](https://github.com/Azure/azure-rest-api-specs/tree/main/specification/cognitiveservices/data-plane/AzureOpenAI/inference)| The data plane inference API provides the inference capabilities/endpoints for features like completions, chat completions, embeddings, speech/whisper, on your data, Dall-e, assistants, etc. |
0 commit comments