Skip to content

Commit 7b520e6

Browse files
committed
update
1 parent 8845abc commit 7b520e6

File tree

1 file changed

+8
-8
lines changed

1 file changed

+8
-8
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
- references_regions
1212
ms.topic: conceptual
13-
ms.date: 11/11/2024
13+
ms.date: 01/09/2025
1414
ms.author: mbullwin
1515
---
1616

@@ -61,26 +61,26 @@ The following sections provide you with a quick guide to the default quotas and
6161

6262
[!INCLUDE [Quota](./includes/global-batch-limits.md)]
6363

64-
## o1-preview & o1-mini rate limits
64+
## o1 & o1-mini rate limits
6565

6666
> [!IMPORTANT]
6767
> The ratio of RPM/TPM for quota with o1-series models works differently than older chat completions models:
6868
>
6969
> - **Older chat models:** 1 unit of capacity = 6 RPM and 1,000 TPM.
70-
> - **o1-preview:** 1 unit of capacity = 1 RPM and 6,000 TPM.
70+
> - **o1 & o1-preview:** 1 unit of capacity = 1 RPM and 6,000 TPM.
7171
> - **o1-mini:** 1 unit of capacity = 1 RPM per 10,000 TPM.
7272
>
7373
> This is particularly important for programmatic model deployment as this change in RPM/TPM ratio can result in accidental under allocation of quota if one is still assuming the 1:1000 ratio followed by older chat completion models.
7474
>
7575
> There is a known issue with the [quota/usages API](/rest/api/aiservices/accountmanagement/usages/list?view=rest-aiservices-accountmanagement-2024-06-01-preview&tabs=HTTP&preserve-view=true) where it assumes the old ratio applies to the new o1-series models. The API returns the correct base capacity number, but does not apply the correct ratio for the accurate calculation of TPM.
7676
77-
### o1-preview & o1-mini global standard
77+
### o1 & o1-mini global standard
7878

7979
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
8080
|---|---|:---:|:---:|
81-
| `o1-preview` | Enterprise agreement | 30 M | 5 K |
81+
| `o1` & `o1-preview` | Enterprise agreement | 30 M | 5 K |
8282
| `o1-mini`| Enterprise agreement | 50 M | 5 K |
83-
| `o1-preview` | Default | 3 M | 500 |
83+
| `o1` & `o1-preview` | Default | 3 M | 500 |
8484
| `o1-mini`| Default | 5 M | 500 |
8585

8686
### o1-preview & o1-mini standard
@@ -179,9 +179,9 @@ To minimize issues related to rate limits, it's a good idea to use the following
179179
- Test different load increase patterns.
180180
- Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.
181181

182-
### How to request increases to the default quotas and limits
182+
## How to request increases to the default quotas and limits
183183

184-
Quota increase requests can be submitted from the [Quotas](./how-to/quota.md) page in the Azure AI Foundry portal. Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
184+
Quota increase requests can be submitted via the [quota increase request form](https://aka.ms/oai/stuquotarequest). Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
185185

186186
For other rate limits, [submit a service request](../cognitive-services-support-options.md?context=/azure/ai-services/openai/context/context).
187187

0 commit comments

Comments
 (0)