Skip to content

Commit 3ffda5f

Browse files
Merge pull request #789 from mrbullwinkle/mrb_10_11_2024_quota
[Azure OpenAI] o1-series quota clarification
2 parents d17bca8 + 71eab64 commit 3ffda5f

File tree

1 file changed

+12
-1
lines changed

1 file changed

+12
-1
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ ms.custom:
1010
- ignite-2023
1111
- references_regions
1212
ms.topic: conceptual
13-
ms.date: 10/10/2024
13+
ms.date: 10/11/2024
1414
ms.author: mbullwin
1515
---
1616

@@ -62,6 +62,17 @@ The following sections provide you with a quick guide to the default quotas and
6262

6363
## o1-preview & o1-mini rate limits
6464

65+
> [!IMPORTANT]
66+
> The ratio of RPM/TPM for quota with o1-series models works differently than older chat completions models:
67+
>
68+
> - **Older chat models:** 1 unit of capacity = 6 RPM and 1,000 TPM.
69+
> - **o1-preview:** 1 unit of capacity = 1 RPM and 6,000 TPM.
70+
> - **o1-mini:** 1 unit of capacity = 1 RPM per 10,000 TPM.
71+
>
72+
> This is particularly important for programmatic model deployment as this change in RPM/TPM ratio can result in accidental under allocation of quota if one is still assuming the 1:1000 ratio followed by older chat completion models.
73+
>
74+
> There is a known issue with the [quota/usages API](/rest/api/aiservices/accountmanagement/usages/list?view=rest-aiservices-accountmanagement-2024-06-01-preview&tabs=HTTP&preserve-view=true) where it assumes the old ratio applies to the new o1-series models. The API returns the correct base capacity number, but does not apply the correct ratio for the accurate calculation of TPM.
75+
6576
### o1-preview & o1-mini global standard
6677

6778
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |

0 commit comments

Comments
 (0)