Skip to content

Commit bd82b01

Browse files
authored
Update quotas-limits.md
1 parent 8243c8d commit bd82b01

File tree

1 file changed

+17
-17
lines changed

1 file changed

+17
-17
lines changed

articles/ai-foundry/openai/quotas-limits.md

Lines changed: 17 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ Quotas and limits aren't enforced at the tenant level. Instead, the highest leve
2525

2626
Tokens per minute (TPM) and requests per minute (RPM) limits are defined *per region*, *per subscription*, and *per model or deployment type*.
2727

28-
For example, if the `gpt-4.1` global standard model is listed with a quota of *5 million TPM* and *5,000 RPM*, then *each region* where that [model or deployment type is available](./concepts/models.md) has its own dedicated quota pool of that amount for *each* of your Azure subscriptions. Within a single Azure subscription, it's possible to use a larger quantity of total TPM and RPM quota for a given model and deployment type, as long as you have resources and model deployments spread across multiple regions.
28+
For example, if the `gpt-4.1` Global Standard model is listed with a quota of *5 million TPM* and *5,000 RPM*, then *each region* where that [model or deployment type is available](./concepts/models.md) has its own dedicated quota pool of that amount for *each* of your Azure subscriptions. Within a single Azure subscription, it's possible to use a larger quantity of total TPM and RPM quota for a given model and deployment type, as long as you have resources and model deployments spread across multiple regions.
2929

3030
## Quotas and limits reference
3131

@@ -75,16 +75,16 @@ The following sections provide you with a quick guide to the default quotas and
7575

7676
## GPT-4 rate limits
7777

78-
### GPT-4.5 preview global standard
78+
### GPT-4.5 preview Global Standard
7979

8080
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
8181
|---|---|:---:|:---:|
8282
| `gpt-4.5` | Enterprise and MCA-E | 200K | 200 |
8383
| `gpt-4.5` | Default | 150K | 150 |
8484

85-
### GPT-4.1 series global standard
85+
### GPT-4.1 series Global Standard
8686

87-
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
87+
| Model|Tier| Quota limit in tokens per minute (TPM) | Requests per minute |
8888
|---|---|:---:|:---:|
8989
| `gpt-4.1` (2025-04-14) | Enterprise and MCA-E | 5M | 5K |
9090
| `gpt-4.1` (2025-04-14) | Default | 1M | 1K |
@@ -93,9 +93,9 @@ The following sections provide you with a quick guide to the default quotas and
9393
| `gpt-4.1-mini` (2025-04-14) | Enterprise and MCA-E | 150M | 150K |
9494
| `gpt-4.1-mini` (2025-04-14) | Default | 5M | 5K |
9595

96-
### GPT-4.1 series data zone standard
96+
### GPT-4.1 series Data Zone Standard
9797

98-
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
98+
| Model|Tier| Quota limit in tokens per minute (TPM) | Requests per minute |
9999
|---|---|:---:|:---:|
100100
| `gpt-4.1` (2025-04-14) | Enterprise and MCA-E | 2M | 2K |
101101
| `gpt-4.1` (2025-04-14) | Default | 300K | 300 |
@@ -120,7 +120,7 @@ The following sections provide you with a quick guide to the default quotas and
120120
| `model-router` (2025-05-19) | Enterprise and MCA-E | 10M | 10K |
121121
| `model-router` (2025-05-19) | Default | 1M | 1K |
122122

123-
## computer-use-preview global standard rate limits
123+
## computer-use-preview Global Standard rate limits
124124

125125
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
126126
|---|---|:---:|:---:|
@@ -144,7 +144,7 @@ The following sections provide you with a quick guide to the default quotas and
144144
>
145145
> This concept is important for programmatic model deployment, because changes in the RPM to TPM ratio can result in accidental misallocation of quota.
146146
147-
### o-series global standard
147+
### o-series Global Standard
148148

149149
| Model |Tier | Quota limit in tokens per minute | Requests per minute |
150150
|--------------------|------------------------|:--------------------------------------:|:---: |
@@ -163,7 +163,7 @@ The following sections provide you with a quick guide to the default quotas and
163163
| `o1` and `o1-preview`| Default | 3M | 500 |
164164
| `o1-mini` | Default | 5M | 500 |
165165

166-
### o-series data zone standard
166+
### o-series Data Zone Standard
167167

168168
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
169169
|---|---|:---:|:---:|
@@ -172,7 +172,7 @@ The following sections provide you with a quick guide to the default quotas and
172172
| `o1` | Enterprise and MCA-E | 6M | 1K |
173173
| `o1` | Default | 600K | 100 |
174174

175-
### o1-preview and o1-mini standard
175+
### o1-preview and o1-mini Standard
176176

177177
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
178178
|---|---|:---:|:---:|
@@ -185,7 +185,7 @@ The following sections provide you with a quick guide to the default quotas and
185185

186186
`gpt-4o` and `gpt-4o-mini` have rate limit tiers with higher limits for certain customer types.
187187

188-
### gpt-4o global standard
188+
### gpt-4o Global Standard
189189

190190
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
191191
|---|---|:---:|:---:|
@@ -194,7 +194,7 @@ The following sections provide you with a quick guide to the default quotas and
194194
|`gpt-4o` |Default | 450K | 2.7K |
195195
|`gpt-4o-mini` | Default | 2M | 12K |
196196

197-
### gpt-4o data zone standard
197+
### gpt-4o Data Zone Standard
198198

199199
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
200200
|---|---|:---:|:---:|
@@ -203,7 +203,7 @@ The following sections provide you with a quick guide to the default quotas and
203203
|`gpt-4o` |Default | 300K | 1.8K |
204204
|`gpt-4o-mini` | Default | 1M | 6K |
205205

206-
### gpt-4o standard
206+
### gpt-4o Standard
207207

208208
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
209209
|---|---|:---:|:---:|
@@ -225,7 +225,7 @@ The rate limits for each `gpt-4o` audio model deployment are 100,000 tokens per
225225

226226
## GPT-image-1 rate limits
227227

228-
### GPT0-image-1 global standard
228+
### GPT0-image-1 Global Standard
229229

230230
| Model|Tier| Quota limit in tokens per minute | Requests per minute |
231231
|---|---|:---:|:---:|
@@ -234,14 +234,14 @@ The rate limits for each `gpt-4o` audio model deployment are 100,000 tokens per
234234

235235
## Usage tiers
236236

237-
Global standard deployments use the global infrastructure of Azure. They dynamically route customer traffic to the data center with the best availability for the customer’s inference requests. Similarly, data zone standard deployments allow you to use the global infrastructure of Azure to dynamically route traffic to the data center within the Microsoft-defined data zone with the best availability for each request. This practice enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
237+
Global Standard deployments use the global infrastructure of Azure. They dynamically route customer traffic to the data center with the best availability for the customer’s inference requests. Similarly, Data Zone Standard deployments allow you to use the global infrastructure of Azure to dynamically route traffic to the data center within the Microsoft-defined data zone with the best availability for each request. This practice enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
238238

239239
The usage limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model. It's the total number of tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
240240

241241
> [!NOTE]
242-
> Usage tiers apply only to standard, data zone standard, and global standard deployment types. Usage tiers don't apply to global batch and provisioned throughput deployments.
242+
> Usage tiers apply only to Standard, Data Zone Standard, and Global Standard deployment types. Usage tiers don't apply to global batch and provisioned throughput deployments.
243243
244-
### Global standard, data zone standard, and standard
244+
### Global Standard, Data Zone Standard, and Standard
245245

246246
|Model| Usage tiers per month |
247247
|----|:----|

0 commit comments

Comments
 (0)