You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> The ratio of RPM/TPM for quota with o1-series models works differently than older chat completions models:
68
68
>
69
69
> -**Older chat models:** 1 unit of capacity = 6 RPM and 1,000 TPM.
70
-
> -**o1-preview:** 1 unit of capacity = 1 RPM and 6,000 TPM.
70
+
> -**o1 & o1-preview:** 1 unit of capacity = 1 RPM and 6,000 TPM.
71
71
> -**o1-mini:** 1 unit of capacity = 1 RPM per 10,000 TPM.
72
72
>
73
73
> This is particularly important for programmatic model deployment as this change in RPM/TPM ratio can result in accidental under allocation of quota if one is still assuming the 1:1000 ratio followed by older chat completion models.
74
74
>
75
75
> There is a known issue with the [quota/usages API](/rest/api/aiservices/accountmanagement/usages/list?view=rest-aiservices-accountmanagement-2024-06-01-preview&tabs=HTTP&preserve-view=true) where it assumes the old ratio applies to the new o1-series models. The API returns the correct base capacity number, but does not apply the correct ratio for the accurate calculation of TPM.
76
76
77
-
### o1-preview & o1-mini global standard
77
+
### o1 & o1-mini global standard
78
78
79
79
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
80
80
|---|---|:---:|:---:|
81
-
|`o1-preview`| Enterprise agreement | 30 M | 5 K |
81
+
|`o1` & `o1-preview`| Enterprise agreement | 30 M | 5 K |
82
82
|`o1-mini`| Enterprise agreement | 50 M | 5 K |
83
-
|`o1-preview`| Default | 3 M | 500 |
83
+
|`o1` & `o1-preview`| Default | 3 M | 500 |
84
84
|`o1-mini`| Default | 5 M | 500 |
85
85
86
86
### o1-preview & o1-mini standard
@@ -179,9 +179,9 @@ To minimize issues related to rate limits, it's a good idea to use the following
179
179
- Test different load increase patterns.
180
180
- Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.
181
181
182
-
###How to request increases to the default quotas and limits
182
+
## How to request increases to the default quotas and limits
183
183
184
-
Quota increase requests can be submitted from the [Quotas](./how-to/quota.md) page in the Azure AI Foundry portal. Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
184
+
Quota increase requests can be submitted via the [quota increase request form](https://aka.ms/oai/stuquotarequest). Due to high demand, quota increase requests are being accepted and will be filled in the order they're received. Priority is given to customers who generate traffic that consumes the existing quota allocation, and your request might be denied if this condition isn't met.
185
185
186
186
For other rate limits, [submit a service request](../cognitive-services-support-options.md?context=/azure/ai-services/openai/context/context).
0 commit comments