You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/quotas-limits.md
+40-30Lines changed: 40 additions & 30 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,7 +4,7 @@ description: Quick reference, detailed description, and best practices on the qu
4
4
author: mrbullwinkle
5
5
ms.author: mbullwin
6
6
manager: nitinme
7
-
ms.date: 07/02/2025
7
+
ms.date: 07/11/2025
8
8
ms.service: azure-ai-openai
9
9
ms.topic: conceptual
10
10
ms.custom:
@@ -17,6 +17,16 @@ ms.custom:
17
17
18
18
This article contains a quick reference and a detailed description of the quotas and limits for Azure OpenAI.
19
19
20
+
**Scope of quota**:
21
+
22
+
- Quotas and limits are not enforced at the tenant level.
23
+
- Instead, the highest level of quota restrictions are scoped at the Azure subscription level.
24
+
25
+
**Regional quota allocation:**
26
+
27
+
- Tokens per minute (TPM) and requests per minute (RPM) limits are defined **per region, per subscription, and per model/deployment type**.
28
+
- For example, if the `gpt-4.1` global standard model is listed with a quota of **5 million TPM and 5,000 RPM**, then **each region** where that [model/deployment type is available](./concepts/models.md) has its own dedicated pool of quota of that amount for **each of your Azure subscriptions**. So within a single Azure subscription, it is possible to use a larger quantity of total TPM/RPM quota for a given model/deployment type, as long as you have resources/model deployments spread across multiple regions.
29
+
20
30
## Quotas and limits reference
21
31
22
32
The following sections provide you with a quick guide to the default quotas and limits that apply to Azure OpenAI:
@@ -70,29 +80,29 @@ The following sections provide you with a quick guide to the default quotas and
70
80
71
81
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
72
82
|---|---|:---:|:---:|
73
-
|`gpt-4.5`| Enterprise Tier| 200 K | 200 |
83
+
|`gpt-4.5`| Enterprise & MCA-E| 200 K | 200 |
74
84
|`gpt-4.5`| Default | 150 K | 150 |
75
85
76
86
### GPT-4.1 series global standard
77
87
78
88
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
79
89
|---|---|:---:|:---:|
80
-
|`gpt-4.1` (2025-04-14) | Enterprise Tier| 5 M | 5 K |
90
+
|`gpt-4.1` (2025-04-14) | Enterprise & MCA-E| 5 M | 5 K |
81
91
|`gpt-4.1` (2025-04-14) | Default | 1 M | 1 K |
82
-
|`gpt-4.1-nano` (2025-04-14) | Enterprise Tier| 150 M | 150 K |
92
+
|`gpt-4.1-nano` (2025-04-14) | Enterprise & MCA-E| 150 M | 150 K |
83
93
|`gpt-4.1-nano` (2025-04-14) | Default | 5 M | 5 K |
84
-
|`gpt-4.1-mini` (2025-04-14) | Enterprise Tier| 150 M | 150 K |
94
+
|`gpt-4.1-mini` (2025-04-14) | Enterprise & MCA-E| 150 M | 150 K |
85
95
|`gpt-4.1-mini` (2025-04-14) | Default | 5 M | 5 K |
86
96
87
97
### GPT-4.1 series data zone standard
88
98
89
99
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
90
100
|---|---|:---:|:---:|
91
-
|`gpt-4.1` (2025-04-14) | Enterprise Tier| 2 M | 2 K |
101
+
|`gpt-4.1` (2025-04-14) | Enterprise & MCA-E| 2 M | 2 K |
92
102
|`gpt-4.1` (2025-04-14) | Default | 300 K | 300 |
93
-
|`gpt-4.1-nano` (2025-04-14) | Enterprise Tier| 50 M | 50 K |
103
+
|`gpt-4.1-nano` (2025-04-14) | Enterprise & MCA-E| 50 M | 50 K |
94
104
|`gpt-4.1-nano` (2025-04-14) | Default | 2 M | 2 K |
95
-
|`gpt-4.1-mini` (2025-04-14) | Enterprise Tier| 50 M | 50 K |
105
+
|`gpt-4.1-mini` (2025-04-14) | Enterprise & MCA-E| 50 M | 50 K |
96
106
|`gpt-4.1-mini` (2025-04-14) | Default | 2 M | 2 K |
97
107
98
108
### GPT-4 Turbo
@@ -101,21 +111,21 @@ The following sections provide you with a quick guide to the default quotas and
101
111
102
112
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
103
113
|---|---|:---:|:---:|
104
-
|`gpt-4` (turbo-2024-04-09) | Enterprise agreement| 2 M | 12 K |
114
+
|`gpt-4` (turbo-2024-04-09) | Enterprise & MCA-E| 2 M | 12 K |
105
115
|`gpt-4` (turbo-2024-04-09) | Default | 450 K | 2.7 K |
106
116
107
117
## model-router rate limits
108
118
109
119
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
110
120
|---|---|:---:|:---:|
111
-
|`model-router` (2025-05-19) | Enterprise Tier| 10 M | 10 K |
121
+
|`model-router` (2025-05-19) | Enterprise & MCA-E| 10 M | 10 K |
112
122
|`model-router` (2025-05-19) | Default | 1 M | 1 K |
113
123
114
124
## computer-use-preview global standard rate limits
115
125
116
126
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
117
127
|---|---|:---:|:---:|
118
-
|`computer-use-preview`| Enterprise Tier| 30 M | 300 K |
128
+
|`computer-use-preview`| Enterprise & MCA-E| 30 M | 300 K |
119
129
|`computer-use-preview`| Default | 450 K | 4.5 K |
120
130
121
131
## o-series rate limits
@@ -139,13 +149,13 @@ The following sections provide you with a quick guide to the default quotas and
139
149
140
150
| Model |Tier | Quota Limit in tokens per minute (TPM) | Requests per minute |
0 commit comments