Skip to content

Commit 18ea9b1

Browse files
committed
audio models for AOAI
1 parent 3996a6e commit 18ea9b1

File tree

5 files changed

+20
-64
lines changed

5 files changed

+20
-64
lines changed

articles/ai-services/openai/concepts/audio.md

Lines changed: 14 additions & 50 deletions
Original file line numberDiff line numberDiff line change
@@ -16,64 +16,28 @@ manager: nitinme
1616
> [!IMPORTANT]
1717
> The content filtering system isn't applied to prompts and completions processed by the audio models such as Whisper in Azure OpenAI Service. Learn more about the [Audio API in Azure OpenAI](models.md?tabs=standard-audio#standard-models-by-endpoint).
1818
19+
Audio models in Azure OpenAI are available via the `realtime`, `completions`, and `audio` APIs. The audio models are designed to handle a variety of tasks, including speech recognition, translation, and text to speech.
1920

20-
### GPT-4o audio models
21+
For information about the available audio models per region in Azure OpenAI Service, see the [audio models](models.md?tabs=standard-audio#standard-models-by-endpoint), [standard models by endpoint](models.md?tabs=standard-audio#standard-models-by-endpoint), and [global standard model availability](models.md?tabs=standard-audio#global-standard-model-availability) documentation.
2122

22-
The GPT 4o audio models are part of the GPT-4o model family and support either low-latency, "speech in, speech out" conversational interactions or audio generation.
23-
- GPT-4o real-time audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user. For more information on how to use GPT-4o real-time audio, see the [GPT-4o real-time audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
24-
- GPT-4o audio completion is designed to generate audio from audio or text prompts, making it a great fit for generating audio books, audio content, and other use cases that require audio generation. The GPT-4o audio completions model introduces the audio modality into the existing `/chat/completions` API. For more information on how to use GPT-4o audio completions, see the [audio generation quickstart](../audio-completions-quickstart.md).
23+
### GPT-4o audio Realtime API
2524

26-
> [!CAUTION]
27-
> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable GA version. Models that are designated preview don't follow the standard Azure OpenAI model lifecycle.
25+
GPT-4o real-time audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user. For more information on how to use GPT-4o real-time audio, see the [GPT-4o real-time audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
2826

29-
To use GPT-4o audio, you need [an Azure OpenAI resource](../how-to/create-resource.md) in one of the [supported regions](#global-standard-model-availability).
27+
## GPT-4o audio completions
3028

31-
When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model.
29+
GPT-4o audio completion is designed to generate audio from audio or text prompts, making it a great fit for generating audio books, audio content, and other use cases that require audio generation. The GPT-4o audio completions model introduces the audio modality into the existing `/chat/completions` API. For more information on how to use GPT-4o audio completions, see the [audio generation quickstart](../audio-completions-quickstart.md).
3230

33-
Details about maximum request tokens and training data are available in the following table.
31+
## Audio API
3432

35-
| Model ID | Description | Max Request (tokens) | Training Data (up to) |
36-
|---|---|---|---|
37-
|`gpt-4o-mini-audio-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for audio and text generation. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
38-
|`gpt-4o-mini-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
39-
|`gpt-4o-audio-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for audio and text generation. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
40-
|`gpt-4o-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
41-
|`gpt-4o-realtime-preview` (2024-10-01) <br> **GPT-4o audio** | **Audio model** for real-time audio processing. |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
42-
43-
To compare the availability of GPT-4o audio models across all regions, see the [models table](#global-standard-model-availability).
44-
45-
### Audio API
46-
47-
The audio models via the `/audio` API can be used for speech to text, translation, and text to speech.
48-
49-
#### Speech to text models
50-
51-
| Model ID | Description | Max Request (audio file size) |
52-
| ----- | ----- | ----- |
53-
| `whisper` | General-purpose speech recognition model. | 25 MB |
54-
| `gpt-4o-transcribe` | Speech to text powered by GPT-4o. | 25 MB|
55-
| `gpt-4o-mini-transcribe` | Speech to text powered by GPT-4o mini. | 25 MB|
56-
57-
You can also use the Whisper model via Azure AI Speech [batch transcription](../../speech-service/batch-transcription-create.md) API. Check out [What is the Whisper model?](../../speech-service/whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
58-
59-
#### Speech translation models
60-
61-
| Model ID | Description | Max Request (audio file size) |
62-
| ----- | ----- | ----- |
63-
| `whisper` | General-purpose speech recognition model. | 25 MB |
64-
65-
#### Text to speech models (Preview)
66-
67-
| Model ID | Description |
68-
| --- | :--- |
69-
| `tts` | Text to speech optimized for speed. |
70-
| `tts-hd` | Text to speech optimized for quality.|
71-
| `gpt-4o-mini-tts` | Text to speech model powered by GPT-4o mini. |
72-
73-
You can also use the OpenAI text to speech voices via Azure AI Speech. To learn more, see [OpenAI text to speech voices via Azure OpenAI Service or via Azure AI Speech](../../speech-service/openai-voices.md#openai-text-to-speech-voices-via-azure-openai-service-or-via-azure-ai-speech) guide.
74-
75-
For more information see [Audio models region availability](?tabs=standard-audio#standard-models-by-endpoint) in this article.
33+
The audio models via the `/audio` API can be used for speech to text, translation, and text to speech. To get started with the audio API, see the [Whisper quickstart](../whisper-quickstart.md) for speech to text.
7634

35+
> [!NOTE]
36+
> To help you decide whether to use Azure AI Speech or Azure OpenAI Service, see the [Azure AI Speech batch transcription](../../speech-service/batch-transcription-create.md), [What is the Whisper model?](../../speech-service/whisper-overview.md), and [OpenAI text to speech voices](../../speech-service/openai-voices.md#openai-text-to-speech-voices-via-azure-openai-service-or-via-azure-ai-speech) guides.
7737
7838
## Related content
7939

40+
- [Audio models](models.md#audio-models)
41+
- [Whisper quickstart](../whisper-quickstart.md)
42+
- [Audio generation quickstart](../audio-completions-quickstart.md)
43+
- [GPT-4o real-time audio quickstart](../realtime-audio-quickstart.md)

articles/ai-services/openai/concepts/models.md

Lines changed: 3 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -202,19 +202,15 @@ The DALL-E models generate images from text prompts that the user provides. DALL
202202

203203
## Audio models
204204

205+
Audio models in Azure OpenAI are available via the `realtime`, `completions`, and `audio` APIs.
206+
205207
### GPT-4o audio models
206208

207209
The GPT 4o audio models are part of the GPT-4o model family and support either low-latency, "speech in, speech out" conversational interactions or audio generation.
208-
- GPT-4o real-time audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user. For more information on how to use GPT-4o real-time audio, see the [GPT-4o real-time audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
209-
- GPT-4o audio completion is designed to generate audio from audio or text prompts, making it a great fit for generating audio books, audio content, and other use cases that require audio generation. The GPT-4o audio completions model introduces the audio modality into the existing `/chat/completions` API. For more information on how to use GPT-4o audio completions, see the [audio generation quickstart](../audio-completions-quickstart.md).
210210

211211
> [!CAUTION]
212212
> We don't recommend using preview models in production. We will upgrade all deployments of preview models to either future preview versions or to the latest stable GA version. Models that are designated preview don't follow the standard Azure OpenAI model lifecycle.
213213
214-
To use GPT-4o audio, you need [an Azure OpenAI resource](../how-to/create-resource.md) in one of the [supported regions](#global-standard-model-availability).
215-
216-
When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model.
217-
218214
Details about maximum request tokens and training data are available in the following table.
219215

220216
| Model ID | Description | Max Request (tokens) | Training Data (up to) |
@@ -239,8 +235,6 @@ The audio models via the `/audio` API can be used for speech to text, translatio
239235
| `gpt-4o-transcribe` | Speech to text powered by GPT-4o. | 25 MB|
240236
| `gpt-4o-mini-transcribe` | Speech to text powered by GPT-4o mini. | 25 MB|
241237

242-
You can also use the Whisper model via Azure AI Speech [batch transcription](../../speech-service/batch-transcription-create.md) API. Check out [What is the Whisper model?](../../speech-service/whisper-overview.md) to learn more about when to use Azure AI Speech vs. Azure OpenAI Service.
243-
244238
#### Speech translation models
245239

246240
| Model ID | Description | Max Request (audio file size) |
@@ -253,9 +247,7 @@ You can also use the Whisper model via Azure AI Speech [batch transcription](../
253247
| --- | :--- |
254248
| `tts` | Text to speech optimized for speed. |
255249
| `tts-hd` | Text to speech optimized for quality.|
256-
| `gpt-4o-mini-tts` | Text to speech model powered by GPT-4o mini. |
257-
258-
You can also use the OpenAI text to speech voices via Azure AI Speech. To learn more, see [OpenAI text to speech voices via Azure OpenAI Service or via Azure AI Speech](../../speech-service/openai-voices.md#openai-text-to-speech-voices-via-azure-openai-service-or-via-azure-ai-speech) guide.
250+
| `gpt-4o-mini-tts` | Text to speech model powered by GPT-4o mini.<br/><br/>You can guide the voice to speak in a style or tone. |
259251

260252
For more information see [Audio models region availability](?tabs=standard-audio#standard-models-by-endpoint) in this article.
261253

articles/ai-services/openai/how-to/realtime-audio.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The GPT 4o real-time models are available for global deployments in [East US 2 a
2727
- `gpt-4o-realtime-preview` (2024-12-17)
2828
- `gpt-4o-realtime-preview` (2024-10-01)
2929

30-
See the [models and versions documentation](../concepts/models.md#gpt-4o-audio) for more information.
30+
See the [models and versions documentation](../concepts/models.md#audio-models) for more information.
3131

3232
## Get started
3333

articles/ai-services/openai/realtime-audio-quickstart.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,7 +28,7 @@ The GPT 4o real-time models are available for global deployments.
2828
- `gpt-4o-mini-realtime-preview` (version `2024-12-17`)
2929
- `gpt-4o-realtime-preview` (version `2024-10-01`)
3030

31-
See the [models and versions documentation](./concepts/models.md#gpt-4o-audio) for more information.
31+
See the [models and versions documentation](./concepts/models.md#audio-models) for more information.
3232

3333
## API support
3434

articles/ai-services/openai/whats-new.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ The `gpt-4o-mini-audio-preview` (2024-12-17) model is the latest audio completio
7979

8080
The `gpt-4o-mini-realtime-preview` (2024-12-17) model is the latest real-time audio model. The real-time models use the same underlying GPT-4o audio model as the completions API, but is optimized for low-latency, real-time audio interactions. For more information, see the [real-time audio quickstart](./realtime-audio-quickstart.md).
8181

82-
For more information about available models, see the [models and versions documentation](./concepts/models.md#gpt-4o-audio).
82+
For more information about available models, see the [models and versions documentation](./concepts/models.md#audio-models).
8383

8484
## January 2025
8585

0 commit comments

Comments
 (0)