Skip to content

Commit 62dcf48

Browse files
committed
reorg quotas doc
1 parent cc27d5e commit 62dcf48

File tree

1 file changed

+29
-20
lines changed

1 file changed

+29
-20
lines changed

articles/ai-services/openai/quotas-limits.md

Lines changed: 29 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,16 @@ The following sections provide you with a quick guide to the default quotas and
6161
[!INCLUDE [Quota](./includes/global-batch-limits.md)]
6262

6363

64+
## GPT-4 rate limits
6465

65-
## GPT 4.1 series rate limits
66+
### GPT-4.5 preview global standard
67+
68+
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
69+
|---|---|:---:|:---:|
70+
| `gpt-4.5` | Enterprise Tier | 200 K | 200 |
71+
| `gpt-4.5` | Default | 150 K | 150 |
72+
73+
### GPT-4.1 series
6674

6775
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
6876
|---|---|:---:|:---:|
@@ -73,27 +81,30 @@ The following sections provide you with a quick guide to the default quotas and
7381
| `gpt-4.1-mini` (2025-04-14) | Enterprise Tier | 5 M | 5 K |
7482
| `gpt-4.1-mini` (2025-04-14) | Default | 1 M | 1 K |
7583

84+
### GPT-4 Turbo
85+
86+
`gpt-4` (`turbo-2024-04-09`) has rate limit tiers with higher limits for certain customer types.
87+
88+
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
89+
|---|---|:---:|:---:|
90+
|`gpt-4` (turbo-2024-04-09) | Enterprise agreement | 2 M | 12 K |
91+
|`gpt-4` (turbo-2024-04-09) | Default | 450 K | 2.7 K |
92+
7693
## model router rate limits
7794

7895
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
7996
|---|---|:---:|:---:|
8097
| `model-router` (2025-04-15) | Default | 128 K | TBD |
8198

82-
## computer-use-preview global standard
99+
## computer-use-preview global standard rate limits
83100

84101
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
85102
|---|---|:---:|:---:|
86103
| `computer-use-preview`| Enterprise Tier | 30 M | 300 K |
87104
| `computer-use-preview`| Default | 450 K | 4.5 K |
88105

89-
## GPT-4.5 Preview global standard
90-
91-
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
92-
|---|---|:---:|:---:|
93-
| `gpt-4.5` | Enterprise Tier | 200 K | 200 |
94-
| `gpt-4.5` | Default | 150 K | 150 |
95106

96-
## `o-series` rate limits
107+
## o-series rate limits
97108

98109
> [!IMPORTANT]
99110
> The ratio of RPM/TPM for quota with o1-series models works differently than older chat completions models:
@@ -109,7 +120,7 @@ The following sections provide you with a quick guide to the default quotas and
109120
>
110121
> There's a known issue with the [quota/usages API](/rest/api/aiservices/accountmanagement/usages/list?view=rest-aiservices-accountmanagement-2024-06-01-preview&tabs=HTTP&preserve-view=true) where it assumes the old ratio applies to the new o1-series models. The API returns the correct base capacity number, but doesn't apply the correct ratio for the accurate calculation of TPM.
111122
112-
### `o-series` global standard
123+
### o-series global standard
113124

114125
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
115126
|---|---|:---:|:---:|
@@ -124,7 +135,7 @@ The following sections provide you with a quick guide to the default quotas and
124135
| `o1` & `o1-preview` | Default | 3 M | 500 |
125136
| `o1-mini`| Default | 5 M | 500 |
126137

127-
### `o-series` data zone standard
138+
### o-series data zone standard
128139

129140
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
130141
|---|---|:---:|:---:|
@@ -142,20 +153,18 @@ The following sections provide you with a quick guide to the default quotas and
142153
| `o1-preview` | Default | 300 K | 50 |
143154
| `o1-mini`| Default | 500 K | 50 |
144155

145-
## gpt-4o & GPT-4 Turbo rate limits
156+
## gpt-4o rate limits
146157

147-
`gpt-4o` and `gpt-4o-mini`, and `gpt-4` (`turbo-2024-04-09`) have rate limit tiers with higher limits for certain customer types.
158+
`gpt-4o` and `gpt-4o-mini` have rate limit tiers with higher limits for certain customer types.
148159

149-
### gpt-4o & GPT-4 Turbo global standard
160+
### gpt-4o global standard
150161

151162
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
152163
|---|---|:---:|:---:|
153164
|`gpt-4o`|Enterprise agreement | 30 M | 180 K |
154165
|`gpt-4o-mini` | Enterprise agreement | 50 M | 300 K |
155-
|`gpt-4` (turbo-2024-04-09) | Enterprise agreement | 2 M | 12 K |
156166
|`gpt-4o` |Default | 450 K | 2.7 K |
157167
|`gpt-4o-mini` | Default | 2 M | 12 K |
158-
|`gpt-4` (turbo-2024-04-09) | Default | 450 K | 2.7 K |
159168

160169
M = million | K = thousand
161170

@@ -182,7 +191,7 @@ M = million | K = thousand
182191

183192
M = million | K = thousand
184193

185-
## gpt-4o audio
194+
### gpt-4o audio
186195

187196
The rate limits for each `gpt-4o` audio model deployment are 100K TPM and 1K RPM. During the preview, [Azure AI Foundry portal](https://ai.azure.com/) and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
188197

@@ -195,7 +204,7 @@ The rate limits for each `gpt-4o` audio model deployment are 100K TPM and 1K RPM
195204

196205
M = million | K = thousand
197206

198-
#### Usage tiers
207+
## Usage tiers
199208

200209
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
201210

@@ -204,14 +213,14 @@ The Usage Limit determines the level of usage above which customers might see la
204213
> [!NOTE]
205214
> Usage tiers only apply to standard, data zone standard, and global standard deployment types. Usage tiers don't apply to global batch and provisioned throughput deployments.
206215
207-
#### GPT-4o global standard, data zone standard, & standard
216+
### GPT-4o global standard, data zone standard, & standard
208217

209218
|Model| Usage Tiers per month |
210219
|----|----|
211220
|`gpt-4o` | 12 Billion tokens |
212221
|`gpt-4o-mini` | 85 Billion tokens |
213222

214-
#### GPT-4 standard
223+
### GPT-4 standard
215224

216225
|Model| Usage Tiers per month|
217226
|---|---|

0 commit comments

Comments
 (0)