You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/openai/quotas-limits.md
+17-17Lines changed: 17 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -25,7 +25,7 @@ Quotas and limits aren't enforced at the tenant level. Instead, the highest leve
25
25
26
26
Tokens per minute (TPM) and requests per minute (RPM) limits are defined *per region*, *per subscription*, and *per model or deployment type*.
27
27
28
-
For example, if the `gpt-4.1`global standard model is listed with a quota of *5 million TPM* and *5,000 RPM*, then *each region* where that [model or deployment type is available](./concepts/models.md) has its own dedicated quota pool of that amount for *each* of your Azure subscriptions. Within a single Azure subscription, it's possible to use a larger quantity of total TPM and RPM quota for a given model and deployment type, as long as you have resources and model deployments spread across multiple regions.
28
+
For example, if the `gpt-4.1`Global Standard model is listed with a quota of *5 million TPM* and *5,000 RPM*, then *each region* where that [model or deployment type is available](./concepts/models.md) has its own dedicated quota pool of that amount for *each* of your Azure subscriptions. Within a single Azure subscription, it's possible to use a larger quantity of total TPM and RPM quota for a given model and deployment type, as long as you have resources and model deployments spread across multiple regions.
29
29
30
30
## Quotas and limits reference
31
31
@@ -75,16 +75,16 @@ The following sections provide you with a quick guide to the default quotas and
75
75
76
76
## GPT-4 rate limits
77
77
78
-
### GPT-4.5 preview global standard
78
+
### GPT-4.5 preview Global Standard
79
79
80
80
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
81
81
|---|---|:---:|:---:|
82
82
|`gpt-4.5`| Enterprise and MCA-E | 200K | 200 |
83
83
|`gpt-4.5`| Default | 150K | 150 |
84
84
85
-
### GPT-4.1 series global standard
85
+
### GPT-4.1 series Global Standard
86
86
87
-
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
87
+
| Model|Tier| Quota limit in tokens per minute (TPM) | Requests per minute |
## computer-use-preview global standard rate limits
123
+
## computer-use-preview Global Standard rate limits
124
124
125
125
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
126
126
|---|---|:---:|:---:|
@@ -144,7 +144,7 @@ The following sections provide you with a quick guide to the default quotas and
144
144
>
145
145
> This concept is important for programmatic model deployment, because changes in the RPM to TPM ratio can result in accidental misallocation of quota.
146
146
147
-
### o-series global standard
147
+
### o-series Global Standard
148
148
149
149
| Model |Tier | Quota limit in tokens per minute | Requests per minute |
@@ -163,7 +163,7 @@ The following sections provide you with a quick guide to the default quotas and
163
163
|`o1` and `o1-preview`| Default | 3M | 500 |
164
164
|`o1-mini`| Default | 5M | 500 |
165
165
166
-
### o-series data zone standard
166
+
### o-series Data Zone Standard
167
167
168
168
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
169
169
|---|---|:---:|:---:|
@@ -172,7 +172,7 @@ The following sections provide you with a quick guide to the default quotas and
172
172
|`o1`| Enterprise and MCA-E | 6M | 1K |
173
173
|`o1`| Default | 600K | 100 |
174
174
175
-
### o1-preview and o1-mini standard
175
+
### o1-preview and o1-mini Standard
176
176
177
177
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
178
178
|---|---|:---:|:---:|
@@ -185,7 +185,7 @@ The following sections provide you with a quick guide to the default quotas and
185
185
186
186
`gpt-4o` and `gpt-4o-mini` have rate limit tiers with higher limits for certain customer types.
187
187
188
-
### gpt-4o global standard
188
+
### gpt-4o Global Standard
189
189
190
190
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
191
191
|---|---|:---:|:---:|
@@ -194,7 +194,7 @@ The following sections provide you with a quick guide to the default quotas and
194
194
|`gpt-4o`|Default | 450K | 2.7K |
195
195
|`gpt-4o-mini`| Default | 2M | 12K |
196
196
197
-
### gpt-4o data zone standard
197
+
### gpt-4o Data Zone Standard
198
198
199
199
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
200
200
|---|---|:---:|:---:|
@@ -203,7 +203,7 @@ The following sections provide you with a quick guide to the default quotas and
203
203
|`gpt-4o`|Default | 300K | 1.8K |
204
204
|`gpt-4o-mini`| Default | 1M | 6K |
205
205
206
-
### gpt-4o standard
206
+
### gpt-4o Standard
207
207
208
208
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
209
209
|---|---|:---:|:---:|
@@ -225,7 +225,7 @@ The rate limits for each `gpt-4o` audio model deployment are 100,000 tokens per
225
225
226
226
## GPT-image-1 rate limits
227
227
228
-
### GPT0-image-1 global standard
228
+
### GPT0-image-1 Global Standard
229
229
230
230
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
231
231
|---|---|:---:|:---:|
@@ -234,14 +234,14 @@ The rate limits for each `gpt-4o` audio model deployment are 100,000 tokens per
234
234
235
235
## Usage tiers
236
236
237
-
Global standard deployments use the global infrastructure of Azure. They dynamically route customer traffic to the data center with the best availability for the customer’s inference requests. Similarly, data zone standard deployments allow you to use the global infrastructure of Azure to dynamically route traffic to the data center within the Microsoft-defined data zone with the best availability for each request. This practice enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
237
+
Global Standard deployments use the global infrastructure of Azure. They dynamically route customer traffic to the data center with the best availability for the customer’s inference requests. Similarly, Data Zone Standard deployments allow you to use the global infrastructure of Azure to dynamically route traffic to the data center within the Microsoft-defined data zone with the best availability for each request. This practice enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
238
238
239
239
The usage limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model. It's the total number of tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
240
240
241
241
> [!NOTE]
242
-
> Usage tiers apply only to standard, data zone standard, and global standard deployment types. Usage tiers don't apply to global batch and provisioned throughput deployments.
242
+
> Usage tiers apply only to Standard, Data Zone Standard, and Global Standard deployment types. Usage tiers don't apply to global batch and provisioned throughput deployments.
243
243
244
-
### Global standard, data zone standard, and standard
244
+
### Global Standard, Data Zone Standard, and Standard
0 commit comments