Skip to content

Commit 79b7fe5

Browse files
authored
Merge pull request #2308 from eric-urban/eur/realtime-2024-12-17
realtime api new model version
2 parents b3683f9 + 014bb4a commit 79b7fe5

File tree

9 files changed

+54
-35
lines changed

9 files changed

+54
-35
lines changed

articles/ai-services/openai/concepts/model-retirements.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ These models are currently available for use in Azure OpenAI Service.
107107
| `gpt-4` | vision-preview | To be upgraded to `gpt-4` version: `turbo-2024-04-09`, starting no sooner than January 27, 2025 **<sup>1</sup>** | `gpt-4o`|
108108
| `gpt-4o` | 2024-05-13 | No earlier than May 20, 2025 <br><br>Deployments set to [**Auto-update to default**](/azure/ai-services/openai/how-to/working-with-models?tabs=powershell#auto-update-to-default) will be automatically upgraded to version: `2024-08-06`, starting on February 13, 2025. | |
109109
| `gpt-4o-mini` | 2024-07-18 | No earlier than July 18, 2025 | |
110+
| `gpt-4o-realtime-preview` | 2024-10-01 | No earlier than September 30, 2025 | `gpt-4o-realtime-preview` (version 2024-12-17) |
110111
| `gpt-3.5-turbo-instruct` | 0914 | No earlier than April 1, 2025 | |
111112
| `o1` | 2024-12-17 | No earlier than December 17, 2025 | |
112113
| `text-embedding-ada-002` | 2 | No earlier than October 3, 2025 | `text-embedding-3-small` or `text-embedding-3-large` |

articles/ai-services/openai/concepts/models.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -58,17 +58,18 @@ To learn more about the advanced `o1` series models see, [getting started with o
5858

5959
## GPT-4o-Realtime-Preview
6060

61-
The `gpt-4o-realtime-preview` model is part of the GPT-4o model family and supports low-latency, "speech in, speech out" conversational interactions. GPT-4o audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user.
61+
The GPT 4o audio models are part of the GPT-4o model family and support low-latency, "speech in, speech out" conversational interactions. GPT-4o audio is designed to handle real-time, low-latency conversational interactions, making it a great fit for support agents, assistants, translators, and other use cases that need highly responsive back-and-forth with a user.
6262

6363
GPT-4o audio is available in the East US 2 (`eastus2`) and Sweden Central (`swedencentral`) regions. To use GPT-4o audio, you need to [create](../how-to/create-resource.md) or use an existing resource in one of the supported regions.
6464

65-
When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model. If you are performing a programmatic deployment, the **model** name is `gpt-4o-realtime-preview`. For more information on how to use GPT-4o audio, see the [GPT-4o audio documentation](../realtime-audio-quickstart.md).
65+
When your resource is created, you can [deploy](../how-to/create-resource.md#deploy-a-model) the GPT-4o audio model. For more information on how to use GPT-4o audio, see the [GPT-4o audio quickstart](../realtime-audio-quickstart.md) and [how to use GPT-4o audio](../how-to/realtime-audio.md).
6666

6767
Details about maximum request tokens and training data are available in the following table.
6868

6969
| Model ID | Description | Max Request (tokens) | Training Data (up to) |
70-
| --- | :--- |:--- |:---: |
71-
|`gpt-4o-realtime-preview` (2024-10-01-preview) <br> **GPT-4o audio** | **Audio model** for real-time audio processing |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
70+
|---|---|---|---|
71+
|`gpt-4o-realtime-preview` (2024-10-01) <br> **GPT-4o audio** | **Audio model** for real-time audio processing |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
72+
|`gpt-4o-realtime-preview` (2024-12-17) <br> **GPT-4o audio** | **Audio model** for real-time audio processing |Input: 128,000 <br> Output: 4,096 | Oct 2023 |
7273

7374
## GPT-4o and GPT-4 Turbo
7475

articles/ai-services/openai/how-to/prompt-caching.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ Currently only the following models support prompt caching with Azure OpenAI:
2828
- `gpt-4o-2024-11-20`
2929
- `gpt-4o-2024-08-06`
3030
- `gpt-4o-mini-2024-07-18`
31+
- `gpt-4o-realtime-preview` (version 2024-12-17)`
3132

3233
> [!NOTE]
3334
> Prompt caching is now also available as part of model fine-tuning for `gpt-4o` and `gpt-4o-mini`. Refer to the fine-tuning section of the [pricing page](https://azure.microsoft.com/pricing/details/cognitive-services/openai-service/) for details.
@@ -76,14 +77,14 @@ A single character difference in the first 1,024 tokens will result in a cache m
7677

7778
The o1-series models are text only and don't support system messages, images, tool use/function calling, or structured outputs. This limits the efficacy of prompt caching for these models to the user/assistant portions of the messages array which are less likely to have an identical 1024 token prefix.
7879

79-
For `gpt-4o` and `gpt-4o-mini` models, prompt caching is supported for:
80+
Prompt caching is supported for:
8081

81-
| **Caching Supported** | **Description** |
82-
|--------|--------|
83-
|**Messages** | The complete messages array: system, user, and assistant content |
84-
|**Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests.
85-
|**Tool use**| Both the messages array and tool definitions |
86-
|**Structured outputs** | Structured output schema is appended as a prefix to the system message|
82+
|**Caching supported**|**Description**|**Supported models**|
83+
|--------|--------|--------|
84+
| **Messages** | The complete messages array: system, user, and assistant content | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) |
85+
| **Images** | Images included in user messages, both as links or as base64-encoded data. The detail parameter must be set the same across requests. | `gpt-4o`<br/>`gpt-4o-mini` |
86+
| **Tool use** | Both the messages array and tool definitions. | `gpt-4o`<br/>`gpt-4o-mini`<br/>`gpt-4o-realtime-preview` (version 2024-12-17) |
87+
| **Structured outputs** | Structured output schema is appended as a prefix to the system message. | `gpt-4o`<br/>`gpt-4o-mini` |
8788

8889
To improve the likelihood of cache hits occurring, you should structure your requests such that repetitive content occurs at the beginning of the messages array.
8990

articles/ai-services/openai/how-to/realtime-audio.md

Lines changed: 4 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,19 +22,11 @@ Most users of the Realtime API need to deliver and receive audio from an end-use
2222

2323
## Supported models
2424

25-
Currently only `gpt-4o-realtime-preview` version: `2024-10-01-preview` supports real-time audio.
25+
The GPT 4o realtime models are available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
26+
- `gpt-4o-realtime-preview` (2024-12-17)
27+
- `gpt-4o-realtime-preview` (2024-10-01)
2628

27-
The `gpt-4o-realtime-preview` model is available for global deployments in [East US 2 and Sweden Central regions](../concepts/models.md#global-standard-model-availability).
28-
29-
> [!IMPORTANT]
30-
> The system stores your prompts and completions as described in the "Data Use and Access for Abuse Monitoring" section of the service-specific Product Terms for Azure OpenAI Service, except that the Limited Exception does not apply. Abuse monitoring will be turned on for use of the `gpt-4o-realtime-preview` API even for customers who otherwise are approved for modified abuse monitoring.
31-
32-
## API support
33-
34-
Support for the Realtime API was first added in API version `2024-10-01-preview`.
35-
36-
> [!NOTE]
37-
> For more information about the API and architecture, see the [Azure OpenAI GPT-4o real-time audio repository on GitHub](https://github.com/azure-samples/aoai-realtime-audio-sdk).
29+
See the [models and versions documentation](../concepts/models.md#gpt-4o-realtime-preview) for more information.
3830

3931
## Get started
4032

articles/ai-services/openai/includes/realtime-deploy-model.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ To deploy the `gpt-4o-realtime-preview` model in the Azure AI Foundry portal:
1212
1. Select the **Real-time audio** playground from under **Playgrounds** in the left pane.
1313
1. Select **Create new deployment** to open the deployment window.
1414
1. Search for and select the `gpt-4o-realtime-preview` model and then select **Confirm**.
15-
1. In the deployment wizard, make sure to select the `2024-10-01` model version.
15+
1. In the deployment wizard, select the `2024-12-17` model version.
1616
1. Follow the wizard to finish deploying the model.
1717

1818
Now that you have a deployment of the `gpt-4o-realtime-preview` model, you can interact with it in real time in the Azure AI Foundry portal **Real-time audio** playground or Realtime API.

articles/ai-services/openai/quotas-limits.md

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,6 @@ The following sections provide you with a quick guide to the default quotas and
5151
| GPT-4o max images per request (# of images in the messages array/conversation history) | 50 |
5252
| GPT-4 `vision-preview` & GPT-4 `turbo-2024-04-09` default max tokens | 16 <br><br> Increase the `max_tokens` parameter value to avoid truncated responses. GPT-4o max tokens defaults to 4096. |
5353
| Max number of custom headers in API requests<sup>1</sup> | 10 |
54-
| Max number requests per minute<br/><br/>Current rate limits for real time audio (`gpt-4o-realtime-preview`) are defined as the number of new websocket connections per minute. For example, 100 requests per minute (RPM) means 100 new connections per minute. | 100 new connections per minute |
5554

5655
<sup>1</sup> Our current APIs allow up to 10 custom headers, which are passed through the pipeline, and returned. Some customers now exceed this header count resulting in HTTP 431 errors. There's no solution for this error, other than to reduce header volume. **In future API versions we will no longer pass through custom headers**. We recommend customers not depend on custom headers in future system architectures.
5756

@@ -132,6 +131,14 @@ M = million | K = thousand
132131

133132
M = million | K = thousand
134133

134+
## gpt-4o audio
135+
136+
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
137+
|---|---|:---:|:---:|
138+
|`gpt-4o-realtime-preview` | Default | 100 K | 1 K |
139+
140+
M = million | K = thousand
141+
135142
#### Usage tiers
136143

137144
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.

articles/ai-services/openai/realtime-audio-quickstart.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -23,12 +23,11 @@ Most users of the Realtime API need to deliver and receive audio from an end-use
2323

2424
## Supported models
2525

26-
Currently only `gpt-4o-realtime-preview` version: `2024-10-01-preview` supports real-time audio.
26+
The GPT 4o realtime models are available for global deployments in [East US 2 and Sweden Central regions](./concepts/models.md#global-standard-model-availability).
27+
- `gpt-4o-realtime-preview` (2024-12-17)
28+
- `gpt-4o-realtime-preview` (2024-10-01)
2729

28-
The `gpt-4o-realtime-preview` model is available for global deployments in [East US 2 and Sweden Central regions](./concepts/models.md#global-standard-model-availability).
29-
30-
> [!IMPORTANT]
31-
> The system stores your prompts and completions as described in the "Data Use and Access for Abuse Monitoring" section of the service-specific Product Terms for Azure OpenAI Service, except that the Limited Exception does not apply. Abuse monitoring will be turned on for use of the `gpt-4o-realtime-preview` API even for customers who otherwise are approved for modified abuse monitoring.
30+
See the [models and versions documentation](./concepts/models.md#gpt-4o-realtime-preview) for more information.
3231

3332
## API support
3433

articles/ai-services/openai/realtime-audio-reference.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1508,8 +1508,14 @@ Currently, only 'function' tools are supported.
15081508
**Allowed Values:**
15091509

15101510
* `alloy`
1511-
* `shimmer`
1512-
* `echo`
1511+
* `ash`
1512+
* `ballad`
1513+
* `coral`
1514+
* `echo`
1515+
* `sage`
1516+
* `shimmer`
1517+
* `verse`
1518+
15131519

15141520
## Related content
15151521

articles/ai-services/openai/whats-new.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,26 @@ ms.custom:
1111
- references_regions
1212
- ignite-2024
1313
ms.topic: whats-new
14-
ms.date: 11/18/2024
14+
ms.date: 1/15/2025
1515
recommendations: false
1616
---
1717

1818
# What's new in Azure OpenAI Service
1919

2020
This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
2121

22+
## January 2025
23+
24+
### GPT-4o Realtime API 2024-12-17
25+
26+
The `gpt-4o-realtime-preview` model version 2024-12-17 is available for global deployments in [East US 2 and Sweden Central regions](./concepts/models.md#global-standard-model-availability). Use the `gpt-4o-realtime-preview` version 2024-12-17 model instead of the `gpt-4o-realtime-preview` version 2024-10-01-preview model for real-time audio interactions.
27+
28+
- Added support for [prompt caching](./how-to/prompt-caching.md) with the `gpt-4o-realtime-preview` model.
29+
- Added support for new voices. The `gpt-4o-realtime-preview` models now support the following voices: "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse".
30+
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for the `gpt-4o-realtime-preview` model are 100K TPM and 1K RPM.
31+
32+
For more information, see the [GPT-4o real-time audio quickstart](realtime-audio-quickstart.md) and the [how-to guide](./how-to/realtime-audio.md).
33+
2234
## December 2024
2335

2436
### o1 reasoning model released for limited access
@@ -77,7 +89,7 @@ For fine-tuning model region availability, see the [models page](./concepts/mode
7789

7890
We are introducing new forms of abuse monitoring that leverage LLMs to improve efficiency of detection of potentially abusive use of the Azure OpenAI service and to enable abuse monitoring without the need for human review of prompts and completions. Learn more, see [Abuse monitoring](/azure/ai-services/openai/concepts/abuse-monitoring).
7991

80-
Prompts and completions that are flagged through content classification and/or identified to be part of a potentially abusive pattern of use are subjected to an additional review process to help confirm the systems analysis and inform actioning decisions. Our abuse monitoring systems have been expanded to enable review by LLM by default and by humans when necessary and appropriate.
92+
Prompts and completions that are flagged through content classification and/or identified to be part of a potentially abusive pattern of use are subjected to an additional review process to help confirm the system's analysis and inform actioning decisions. Our abuse monitoring systems have been expanded to enable review by LLM by default and by humans when necessary and appropriate.
8193

8294
## October 2024
8395

@@ -135,7 +147,7 @@ Azure OpenAI GPT-4o audio is part of the GPT-4o model family that supports low-l
135147

136148
The `gpt-4o-realtime-preview` model is available for global deployments in [East US 2 and Sweden Central regions](./concepts/models.md#global-standard-model-availability).
137149

138-
For more information, see the [GPT-4o real-time audio documentation](realtime-audio-quickstart.md).
150+
For more information, see the [GPT-4o real-time audio quickstart](realtime-audio-quickstart.md).
139151

140152
### Global batch support updates
141153

0 commit comments

Comments
 (0)