Skip to content

Commit f8edc86

Browse files
committed
restructure table
1 parent e174663 commit f8edc86

File tree

1 file changed

+20
-16
lines changed

1 file changed

+20
-16
lines changed

articles/ai-foundry/foundry-models/quotas-limits.md

Lines changed: 20 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -30,22 +30,26 @@ Azure uses quotas and limits to prevent budget overruns due to fraud, and to hon
3030

3131
### Rate limits
3232

33-
| Limit name | Applies to | Limit value |
34-
| -------------------- | ------------------- | ----------- |
35-
| Tokens per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../openai/quotas-limits.md). |
36-
| Requests per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../openai/quotas-limits.md). |
37-
| Tokens per minute | DeepSeek-R1<br />DeepSeek-V3-0324 | 5,000,000 |
38-
| Requests per minute | DeepSeek-R1<br />DeepSeek-V3-0324 | 5,000 |
39-
| Concurrent requests | DeepSeek-R1<br />DeepSeek-V3-0324 | 300 |
40-
| Tokens per minute | Llama 3.3 70B Instruct​<br />Llama-4-Maverick-17B-128E-Instruct-FP8​<br />Grok 3​​<br />Grok 3 mini | 400,000 |
41-
| Requests per minute | Llama 3.3 70B Instruct​<br />Llama-4-Maverick-17B-128E-Instruct-FP8​<br />Grok 3​​<br />Grok 3 mini | 1,000 |
42-
| Concurrent requests | Llama 3.3 70B Instruct​<br />Llama-4-Maverick-17B-128E-Instruct-FP8​<br />Grok 3​​<br />Grok 3 mini | 300 |
43-
| Requests per minute |Flux-Pro 1.1​<br />Flux.1-Kontext Pro​<br /> | 2 capacity units (6 requests per minute) |
44-
| Tokens per minute | Rest of models | 400,000 |
45-
| Requests per minute | Rest of models | 1,000 |
46-
| Concurrent requests | Rest of models | 300 |
47-
48-
For Azure OpenAI quota increase request, use [request a quota increase](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR4xPXO648sJKt4GoXAed-0pUMFE1Rk9CU084RjA0TUlVSUlMWEQzVkJDNCQlQCN0PWcu) to submit your request. For other models, You can [request increases to the default limits](#request-increases-to-the-default-limits). Due to high demand, limit increase requests can be submitted and are evaluated per request.
33+
The following table lists limits for Foundry Models for the following rates:
34+
35+
- Tokens per minute
36+
- Requests per minute
37+
- Concurrent request
38+
39+
| Models | Tokens per minute | Requests per minute | Concurrent requests |
40+
| ---------------------------------------------------------------------- | --------------------------------------------------- | ----------------------------------------------------- | -------------------- |
41+
| Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../openai/quotas-limits.md). | Varies per model and SKU. See [limits for Azure OpenAI](../openai/quotas-limits.md). | not applicable |
42+
| DeepSeek-R1<br />DeepSeek-V3-0324 | 5,000,000 | 5,000 | 300 |
43+
| Llama 3.3 70B Instruct<br />Llama-4-Maverick-17B-128E-Instruct-FP8<br />Grok 3<br />Grok 3 mini | 400,000 | 1,000 | 300 |
44+
| Flux-Pro 1.1<br />Flux.1-Kontext Pro | not applicable | 2 capacity units (6 requests per minute) | not applicable |
45+
| Rest of models | 400,000 | 1,000 | 300 |
46+
47+
To increase your quota:
48+
49+
- For Azure OpenAI, use [Azure AI Foundry Service: Request for Quota Increase](https://customervoice.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR4xPXO648sJKt4GoXAed-0pUMFE1Rk9CU084RjA0TUlVSUlMWEQzVkJDNCQlQCN0PWcu) to submit your request.
50+
- For other models, see [request increases to the default limits](#request-increases-to-the-default-limits).
51+
52+
Due to high demand, we evaluate limit increase requests per request.
4953

5054
### Other limits
5155

0 commit comments

Comments
 (0)