You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If you want to clean up and remove an Azure OpenAI resource, you can delete the resource. Before deleting the resource, you must first delete any deployed models.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/concepts/models.md
+11-8Lines changed: 11 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ Azure OpenAI Service is powered by a diverse set of models with different capabi
20
20
|--|--|
21
21
|[o1 & o1-mini](#o1-and-o1-mini-models-limited-access)| Limited access models, specifically designed to tackle reasoning and problem-solving tasks with increased focus and capability. |
22
22
|[GPT-4o & GPT-4o mini & GPT-4 Turbo](#gpt-4o-and-gpt-4-turbo)| The latest most capable Azure OpenAI models with multimodal versions, which can accept both text and images as input. |
23
-
|[GPT-4o-Realtime-Preview](#gpt-4o-realtime-preview)|A GPT-4o model that supports low-latency, "speech in, speech out" conversational interactions. |
23
+
|[GPT-4o audio](#gpt-4o-audio)| GPT-4o audio models that support either low-latency, "speech in, speech out" conversational interactions or audio generation. |
24
24
|[GPT-4](#gpt-4)| A set of models that improve on GPT-3.5 and can understand and generate natural language and code. |
25
25
|[GPT-3.5](#gpt-35)| A set of models that improve on GPT-3 and can understand and generate natural language and code. |
26
26
|[Embeddings](#embeddings-models)| A set of models that can convert text into numerical vector form to facilitate text similarity. |
@@ -56,20 +56,23 @@ To learn more about the advanced `o1` series models see, [getting started with o
56
56
|`o1-preview`| See the [models table](#global-standard-model-availability). |
57
57
|`o1-mini`| See the [models table](#global-provisioned-managed-model-availability). |
58
58
59
-
## GPT-4o-Realtime-Preview
59
+
## GPT-4o audio
60
60
61
-
The GPT 4o audio models are part of the GPT-4o model family and support low-latency, "speech in, speech out" conversational interactions. GPT-4o audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user.
61
+
The GPT 4o audio models are part of the GPT-4o model family and support either low-latency, "speech in, speech out" conversational interactions or audio generation.
62
+
- GPT-4o real-time audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user. For more information on how to use GPT-4o real-time audio, see the [GPT-4o real-time audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
63
+
- GPT-4o audio completion is designed to generate audio from audio or text prompts, making it a great fit for generating audio books, audio content, and other use cases that require audio generation. The GPT-4o audio completions model introduces the audio modality into the existing `/chat/completions` API. For more information on how to use GPT-4o audio completions, see the [audio generation quickstart](../audio-completions-quickstart.md).
62
64
63
-
GPT-4o audio is available in the East US 2 (`eastus2`) and Sweden Central (`swedencentral`) regions. To use GPT-4o audio, you need to [create](../how-to/create-resource.md) or use an existing resource in one of the supported regions.
65
+
GPT-4o audio is available in the East US 2 (`eastus2`) and Sweden Central (`swedencentral`) regions. To use GPT-4o real-time audio, you need [an Azure OpenAI resource](../how-to/create-resource.md) in one of the supported regions.
64
66
65
-
When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model. For more information on how to use GPT-4o audio, see the [GPT-4o audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
67
+
When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model.
66
68
67
69
Details about maximum request tokens and training data are available in the following table.
68
70
69
71
| Model ID | Description | Max Request (tokens) | Training Data (up to) |
70
72
|---|---|---|---|
71
-
|`gpt-4o-realtime-preview` (2024-10-01) <br> **GPT-4o audio**|**Audio model** for real-time audio processing |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
72
-
|`gpt-4o-realtime-preview` (2024-12-17) <br> **GPT-4o audio**|**Audio model** for real-time audio processing |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
73
+
|`gpt-4o-audio-preview` (2024-12-17) <br> **GPT-4o audio**|**Audio model** for audio and text generation. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
74
+
|`gpt-4o-realtime-preview` (2024-12-17) <br> **GPT-4o audio**|**Audio model** for real-time audio processing. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
75
+
|`gpt-4o-realtime-preview` (2024-10-01) <br> **GPT-4o audio**|**Audio model** for real-time audio processing. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
73
76
74
77
## GPT-4o and GPT-4 Turbo
75
78
@@ -126,7 +129,7 @@ See [model versions](../concepts/model-versions.md) to learn about how Azure Ope
> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable GA version. Models designated preview do not follow the standard Azure OpenAI model lifecycle.
132
+
> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable GA version. Models that are designated preview don't follow the standard Azure OpenAI model lifecycle.
130
133
131
134
- GPT-4 version 0125-preview is an updated version of the GPT-4 Turbo preview previously released as version 1106-preview.
132
135
- GPT-4 version 0125-preview completes tasks such as code generation more completely compared to gpt-4-1106-preview. Because of this, depending on the task, customers may find that GPT-4-0125-preview generates more output compared to the gpt-4-1106-preview. We recommend customers compare the outputs of the new model. GPT-4-0125-preview also addresses bugs in gpt-4-1106-preview with UTF-8 handling for non-English languages.
Copy file name to clipboardExpand all lines: articles/ai-services/openai/how-to/realtime-audio.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -26,7 +26,7 @@ The GPT 4o real-time models are available for global deployments in [East US 2 a
26
26
-`gpt-4o-realtime-preview` (2024-12-17)
27
27
-`gpt-4o-realtime-preview` (2024-10-01)
28
28
29
-
See the [models and versions documentation](../concepts/models.md#gpt-4o-realtime-preview) for more information.
29
+
See the [models and versions documentation](../concepts/models.md#gpt-4o-audio) for more information.
30
30
31
31
## Get started
32
32
@@ -248,7 +248,7 @@ In this case, the server evaluates user audio from the client (as sent via [`inp
248
248
- The server commits the input audio buffer by sending the [`input_audio_buffer.committed`](../realtime-audio-reference.md#realtimeservereventinputaudiobuffercommitted) event.
249
249
- The server sends the [`conversation.item.created`](../realtime-audio-reference.md#realtimeservereventconversationitemcreated) event with the user message item created from the audio buffer.
250
250
251
-
:::image type="content" source="../media/how-to/real-time/input-audio-buffer-server-vad.png" alt-text="Diagram of the Realtime API input audio sequence with server decision mode." lightbox="../media/how-to/real-time/input-audio-buffer-server-vad.png":::
251
+
:::image type="content" source="../media/how-to/real-time/input-audio-buffer-server-vad.png" alt-text="Diagram of the real time API input audio sequence with server decision mode." lightbox="../media/how-to/real-time/input-audio-buffer-server-vad.png":::
252
252
253
253
254
254
<!--
@@ -300,7 +300,7 @@ Optionally, the client can truncate or delete items in the conversation:
300
300
- The client deletes an item in the conversation with a [`conversation.item.delete`](../realtime-audio-reference.md#realtimeclienteventconversationitemdelete) event.
301
301
- The server [`conversation.item.deleted`](../realtime-audio-reference.md#realtimeservereventconversationitemdeleted) event is returned to sync the client and server state.
302
302
303
-
:::image type="content" source="../media/how-to/real-time/conversation-item-sequence.png" alt-text="Diagram of the Realtime API conversation item sequence." lightbox="../media/how-to/real-time/conversation-item-sequence.png":::
303
+
:::image type="content" source="../media/how-to/real-time/conversation-item-sequence.png" alt-text="Diagram of the real-time API conversation item sequence." lightbox="../media/how-to/real-time/conversation-item-sequence.png":::
304
304
305
305
<!--
306
306
sequenceDiagram
@@ -324,11 +324,11 @@ To get a response from the model:
324
324
- The client sends a [`response.create`](../realtime-audio-reference.md#realtimeclienteventresponsecreate) event. The server responds with a [`response.created`](../realtime-audio-reference.md#realtimeservereventresponsecreated) event. The response can contain one or more items, each of which can contain one or more content parts.
325
325
- Or, when using server-side voice activity detection (VAD), the server automatically generates a response when it detects the end of speech in the input audio buffer. The server sends a [`response.created`](../realtime-audio-reference.md#realtimeservereventresponsecreated) event with the generated response.
326
326
327
-
### Response interuption
327
+
### Response interruption
328
328
329
329
The client [`response.cancel`](../realtime-audio-reference.md#realtimeclienteventresponsecancel) event is used to cancel an in-progress response.
330
330
331
-
A user might want to interrupt the assistant's response or ask the assistant to stop talking. The server produces audio faster than realtime. The client can send a [`conversation.item.truncate`](../realtime-audio-reference.md#realtimeclienteventconversationitemtruncate) event to truncate the audio before it's played.
331
+
A user might want to interrupt the assistant's response or ask the assistant to stop talking. The server produces audio faster than real-time. The client can send a [`conversation.item.truncate`](../realtime-audio-reference.md#realtimeclienteventconversationitemtruncate) event to truncate the audio before it's played.
332
332
- The server's understanding of the audio with the client's playback is synchronized.
333
333
- Truncating audio deletes the server-side text transcript to ensure there isn't text in the context that the user doesn't know about.
334
334
- The server responds with a [`conversation.item.truncated`](../realtime-audio-reference.md#realtimeservereventconversationitemtruncated) event.
To chat with your deployed `gpt-4o-audio-preview` model in the **Chat** playground of [Azure AI Foundry portal](https://ai.azure.com), follow these steps:
19
+
20
+
1. Go to the [Azure OpenAI Service page](https://ai.azure.com/resource/overview) in Azure AI Foundry portal. Make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource and the deployed `gpt-4o-audio-preview` model.
21
+
1. Select the **Chat** playground from under **Resource playground** in the left pane.
22
+
1. Select your deployed `gpt-4o-audio-preview` model from the **Deployment** dropdown.
23
+
1. Start chatting with the model and listen to the audio responses.
24
+
25
+
:::image type="content" source="../media/quickstarts/audio-completions-chat-playground.png" alt-text="Screenshot of the Chat playground page." lightbox="../media/quickstarts/audio-completions-chat-playground.png":::
To deploy the `gpt-4o-audio-preview` model in the Azure AI Foundry portal:
11
+
1. Go to the [Azure OpenAI Service page](https://ai.azure.com/resource/overview) in Azure AI Foundry portal. Make sure you're signed in with the Azure subscription that has your Azure OpenAI Service resource and the deployed `gpt-4o-audio-preview` model.
12
+
1. Select the **Chat** playground from under **Playgrounds** in the left pane.
13
+
1. Select **+ Create new deployment** > **From base models** to open the deployment window.
14
+
1. Search for and select the `gpt-4o-audio-preview` model and then select **Deploy to selected resource**.
15
+
1. In the deployment wizard, select the `2024-12-17` model version.
16
+
1. Follow the wizard to finish deploying the model.
17
+
18
+
Now that you have a deployment of the `gpt-4o-audio-preview` model, you can interact with it in the Azure AI Foundry portal **Chat** playground or chat completions API.
The `gpt-4o-audio-preview` model introduces the audio modality into the existing `/chat/completions` API. The audio model expands the potential for AI applications in text and voice-based interactions and audio analysis. Modalities supported in `gpt-4o-audio-preview` model include: text, audio, and text + audio.
11
+
12
+
Here's a table of the supported modalities with example use cases:
13
+
14
+
| Modality input | Modality output | Example use case |
15
+
| --- | --- | --- |
16
+
| Text | Text + audio | Text to speech, audio book generation |
17
+
| Audio | Text + audio | Audio transcription, audio book generation |
18
+
| Audio | Text | Audio transcription |
19
+
| Text + audio | Text + audio | Audio book generation |
20
+
| Text + audio | Text | Audio transcription |
21
+
22
+
By using audio generation capabilities, you can achieve more dynamic and interactive AI applications. Models that support audio inputs and outputs allow you to generate spoken audio responses to prompts and use audio inputs to prompt the model.
23
+
24
+
## Supported models
25
+
26
+
Currently only `gpt-4o-audio-preview` version: `2024-12-17` supports audio generation.
27
+
28
+
The `gpt-4o-audio-preview` model is available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
29
+
30
+
Currently the following voices are supported for audio out: Alloy, Echo, and Shimmer.
31
+
32
+
The maximum audio file size is 20 MB.
33
+
34
+
> [!NOTE]
35
+
> The [Realtime API](../realtime-audio-quickstart.md) uses the same underlying GPT-4o audio model as the completions API, but is optimized for low-latency, real-time audio interactions.
36
+
37
+
## API support
38
+
39
+
Support for audio completions was first added in API version `2025-01-01-preview`.
0 commit comments