Skip to content

Commit b63c2ab

Browse files
Merge pull request #2360 from eric-urban/eur/realtime-limits
realtime api rate limit clarification
2 parents 82aa35a + 565b10f commit b63c2ab

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
- references_regions
1212
ms.topic: conceptual
13-
ms.date: 01/09/2025
13+
ms.date: 1/17/2025
1414
ms.author: mbullwin
1515
---
1616

@@ -133,6 +133,8 @@ M = million | K = thousand
133133

134134
## gpt-4o audio
135135

136+
The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. During the preview, Azure AI Foundry portal and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
137+
136138
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
137139
|---|---|:---:|:---:|
138140
|`gpt-4o-realtime-preview` | Default | 100 K | 1 K |

articles/ai-services/openai/whats-new.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.custom:
1111
- references_regions
1212
- ignite-2024
1313
ms.topic: whats-new
14-
ms.date: 1/15/2025
14+
ms.date: 1/17/2025
1515
recommendations: false
1616
---
1717

@@ -27,7 +27,7 @@ The `gpt-4o-realtime-preview` model version 2024-12-17 is available for global d
2727

2828
- Added support for [prompt caching](./how-to/prompt-caching.md) with the `gpt-4o-realtime-preview` model.
2929
- Added support for new voices. The `gpt-4o-realtime-preview` models now support the following voices: "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse".
30-
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for the `gpt-4o-realtime-preview` model are 100K TPM and 1K RPM.
30+
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. During the preview, Azure AI Foundry portal and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
3131

3232
For more information, see the [GPT-4o real-time audio quickstart](realtime-audio-quickstart.md) and the [how-to guide](./how-to/realtime-audio.md).
3333

0 commit comments

Comments
 (0)