Skip to content

Commit 4936700

Browse files
committed
realtime api rate limit clarification
1 parent 3e71535 commit 4936700

File tree

2 files changed

+5
-3
lines changed

2 files changed

+5
-3
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
- references_regions
1212
ms.topic: conceptual
13-
ms.date: 01/09/2025
13+
ms.date: 1/17/2025
1414
ms.author: mbullwin
1515
---
1616

@@ -133,6 +133,8 @@ M = million | K = thousand
133133

134134
## gpt-4o audio
135135

136+
The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. Azure AI Foundry portal and APIs might show different rate limits, but the actual rate limits are 100K TPM and 1K RPM.
137+
136138
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
137139
|---|---|:---:|:---:|
138140
|`gpt-4o-realtime-preview` | Default | 100 K | 1 K |

articles/ai-services/openai/whats-new.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ ms.custom:
1111
- references_regions
1212
- ignite-2024
1313
ms.topic: whats-new
14-
ms.date: 1/15/2025
14+
ms.date: 1/17/2025
1515
recommendations: false
1616
---
1717

@@ -27,7 +27,7 @@ The `gpt-4o-realtime-preview` model version 2024-12-17 is available for global d
2727

2828
- Added support for [prompt caching](./how-to/prompt-caching.md) with the `gpt-4o-realtime-preview` model.
2929
- Added support for new voices. The `gpt-4o-realtime-preview` models now support the following voices: "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse".
30-
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for the `gpt-4o-realtime-preview` model are 100K TPM and 1K RPM.
30+
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. Azure AI Foundry portal and APIs might show different rate limits, but the actual rate limits are 100K TPM and 1K RPM.
3131

3232
For more information, see the [GPT-4o real-time audio quickstart](realtime-audio-quickstart.md) and the [how-to guide](./how-to/realtime-audio.md).
3333

0 commit comments

Comments
 (0)