You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
69
+
|---|---|:---:|:---:|
70
+
|`gpt-4.5`| Enterprise Tier | 200 K | 200 |
71
+
|`gpt-4.5`| Default | 150 K | 150 |
72
+
73
+
### GPT-4.1 series
66
74
67
75
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
68
76
|---|---|:---:|:---:|
@@ -73,27 +81,30 @@ The following sections provide you with a quick guide to the default quotas and
73
81
|`gpt-4.1-mini` (2025-04-14) | Enterprise Tier | 5 M | 5 K |
74
82
|`gpt-4.1-mini` (2025-04-14) | Default | 1 M | 1 K |
75
83
84
+
### GPT-4 Turbo
85
+
86
+
`gpt-4` (`turbo-2024-04-09`) has rate limit tiers with higher limits for certain customer types.
87
+
88
+
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
89
+
|---|---|:---:|:---:|
90
+
|`gpt-4` (turbo-2024-04-09) | Enterprise agreement | 2 M | 12 K |
91
+
|`gpt-4` (turbo-2024-04-09) | Default | 450 K | 2.7 K |
92
+
76
93
## model router rate limits
77
94
78
95
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
79
96
|---|---|:---:|:---:|
80
97
|`model-router` (2025-04-15) | Default | 128 K | TBD |
81
98
82
-
## computer-use-preview global standard
99
+
## computer-use-preview global standard rate limits
83
100
84
101
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
85
102
|---|---|:---:|:---:|
86
103
|`computer-use-preview`| Enterprise Tier | 30 M | 300 K |
87
104
|`computer-use-preview`| Default | 450 K | 4.5 K |
88
105
89
-
## GPT-4.5 Preview global standard
90
-
91
-
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
92
-
|---|---|:---:|:---:|
93
-
|`gpt-4.5`| Enterprise Tier | 200 K | 200 |
94
-
|`gpt-4.5`| Default | 150 K | 150 |
95
106
96
-
## `o-series` rate limits
107
+
## o-series rate limits
97
108
98
109
> [!IMPORTANT]
99
110
> The ratio of RPM/TPM for quota with o1-series models works differently than older chat completions models:
@@ -109,7 +120,7 @@ The following sections provide you with a quick guide to the default quotas and
109
120
>
110
121
> There's a known issue with the [quota/usages API](/rest/api/aiservices/accountmanagement/usages/list?view=rest-aiservices-accountmanagement-2024-06-01-preview&tabs=HTTP&preserve-view=true) where it assumes the old ratio applies to the new o1-series models. The API returns the correct base capacity number, but doesn't apply the correct ratio for the accurate calculation of TPM.
111
122
112
-
### `o-series` global standard
123
+
### o-series global standard
113
124
114
125
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
115
126
|---|---|:---:|:---:|
@@ -124,7 +135,7 @@ The following sections provide you with a quick guide to the default quotas and
124
135
|`o1` & `o1-preview`| Default | 3 M | 500 |
125
136
|`o1-mini`| Default | 5 M | 500 |
126
137
127
-
### `o-series` data zone standard
138
+
### o-series data zone standard
128
139
129
140
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
130
141
|---|---|:---:|:---:|
@@ -142,20 +153,18 @@ The following sections provide you with a quick guide to the default quotas and
142
153
|`o1-preview`| Default | 300 K | 50 |
143
154
|`o1-mini`| Default | 500 K | 50 |
144
155
145
-
## gpt-4o & GPT-4 Turbo rate limits
156
+
## gpt-4o rate limits
146
157
147
-
`gpt-4o` and `gpt-4o-mini`, and `gpt-4` (`turbo-2024-04-09`) have rate limit tiers with higher limits for certain customer types.
158
+
`gpt-4o` and `gpt-4o-mini` have rate limit tiers with higher limits for certain customer types.
148
159
149
-
### gpt-4o & GPT-4 Turbo global standard
160
+
### gpt-4o global standard
150
161
151
162
| Model|Tier| Quota Limit in tokens per minute (TPM) | Requests per minute |
152
163
|---|---|:---:|:---:|
153
164
|`gpt-4o`|Enterprise agreement | 30 M | 180 K |
154
165
|`gpt-4o-mini`| Enterprise agreement | 50 M | 300 K |
155
-
|`gpt-4` (turbo-2024-04-09) | Enterprise agreement | 2 M | 12 K |
156
166
|`gpt-4o`|Default | 450 K | 2.7 K |
157
167
|`gpt-4o-mini`| Default | 2 M | 12 K |
158
-
|`gpt-4` (turbo-2024-04-09) | Default | 450 K | 2.7 K |
159
168
160
169
M = million | K = thousand
161
170
@@ -182,7 +191,7 @@ M = million | K = thousand
182
191
183
192
M = million | K = thousand
184
193
185
-
## gpt-4o audio
194
+
###gpt-4o audio
186
195
187
196
The rate limits for each `gpt-4o` audio model deployment are 100K TPM and 1K RPM. During the preview, [Azure AI Foundry portal](https://ai.azure.com/) and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit will be 100K TPM and 1K RPM.
188
197
@@ -195,7 +204,7 @@ The rate limits for each `gpt-4o` audio model deployment are 100K TPM and 1K RPM
195
204
196
205
M = million | K = thousand
197
206
198
-
####Usage tiers
207
+
## Usage tiers
199
208
200
209
Global standard deployments use Azure's global infrastructure, dynamically routing customer traffic to the data center with best availability for the customer’s inference requests. Similarly, Data zone standard deployments allow you to leverage Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. This enables more consistent latency for customers with low to medium levels of traffic. Customers with high sustained levels of usage might see greater variability in response latency.
201
210
@@ -204,14 +213,14 @@ The Usage Limit determines the level of usage above which customers might see la
204
213
> [!NOTE]
205
214
> Usage tiers only apply to standard, data zone standard, and global standard deployment types. Usage tiers don't apply to global batch and provisioned throughput deployments.
206
215
207
-
####GPT-4o global standard, data zone standard, & standard
216
+
### GPT-4o global standard, data zone standard, & standard
0 commit comments