Skip to content

Commit 565b10f

Browse files
committed
realtime api rate limit clarification
1 parent 4936700 commit 565b10f

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ M = million | K = thousand
133133

134134
## gpt-4o audio
135135

136-
The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. Azure AI Foundry portal and APIs might show different rate limits, but the actual rate limits are 100K TPM and 1K RPM.
136+
The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. During the preview, Azure AI Foundry portal and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
137137

138138
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
139139
|---|---|:---:|:---:|

articles/ai-services/openai/whats-new.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ The `gpt-4o-realtime-preview` model version 2024-12-17 is available for global d
2727

2828
- Added support for [prompt caching](./how-to/prompt-caching.md) with the `gpt-4o-realtime-preview` model.
2929
- Added support for new voices. The `gpt-4o-realtime-preview` models now support the following voices: "alloy", "ash", "ballad", "coral", "echo", "sage", "shimmer", "verse".
30-
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. Azure AI Foundry portal and APIs might show different rate limits, but the actual rate limits are 100K TPM and 1K RPM.
30+
- Rate limits are no longer based on connections per minute. Rate limiting is now based on RPM (requests per minute) and TPM (tokens per minute) for the `gpt-4o-realtime-preview` model. The rate limits for each `gpt-4o-realtime-preview` model deployment are 100K TPM and 1K RPM. During the preview, Azure AI Foundry portal and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
3131

3232
For more information, see the [GPT-4o real-time audio quickstart](realtime-audio-quickstart.md) and the [how-to guide](./how-to/realtime-audio.md).
3333

0 commit comments

Comments
 (0)