Skip to content

Commit 1f6d48d

Browse files
authored
Merge pull request #3158 from santiagxf/santiagxf-patch-1
Update quotas-limits.md
2 parents 7659986 + 0c26327 commit 1f6d48d

File tree

1 file changed

+36
-12
lines changed

1 file changed

+36
-12
lines changed

articles/ai-foundry/model-inference/quotas-limits.md

Lines changed: 36 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ This article contains a quick reference and a detailed description of the quotas
1717

1818
## Quotas and limits reference
1919

20-
The following sections provide you with a quick guide to the default quotas and limits that apply to Azure AI model's inference service in Azure AI services:
20+
Azure uses quotas and limits to prevent budget overruns due to fraud, and to honor Azure capacity constraints. Consider these limits as you scale for production workloads. The following sections provide you with a quick guide to the default quotas and limits that apply to Azure AI model's inference service in Azure AI services:
2121

2222
### Resource limits
2323

@@ -28,12 +28,18 @@ The following sections provide you with a quick guide to the default quotas and
2828

2929
### Rate limits
3030

31-
| Limit name | Limit value |
32-
| ---------- | ----------- |
33-
| Tokens per minute (Azure OpenAI models) | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
34-
| Tokens per minute (rest of models) | 200.000 |
35-
| Requests per minute (Azure OpenAI models) | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
36-
| Requests per minute (rest of models) | 1.000 |
31+
| Limit name | Applies to | Limit value |
32+
| -------------------- | ------------------- | ----------- |
33+
| Tokens per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
34+
| Requests per minute | Azure OpenAI models | Varies per model and SKU. See [limits for Azure OpenAI](../../ai-services/openai/quotas-limits.md). |
35+
| Tokens per minute | DeepSeek models | 5.000.000 |
36+
| Requests per minute | DeepSeek models | 5.000 |
37+
| Concurrent requests | DeepSeek models | 300 |
38+
| Tokens per minute | Rest of models | 200.000 |
39+
| Requests per minute | Rest of models | 1.000 |
40+
| Concurrent requests | Rest of models | 300 |
41+
42+
You can [request increases to the default limits](#request-increases-to-the-default-limits). Due to high demand, limit increase requests can be submitted and evaluated per request.
3743

3844
### Other limits
3945

@@ -49,6 +55,28 @@ Global Standard deployments use Azure's global infrastructure, dynamically routi
4955

5056
The Usage Limit determines the level of usage above which customers might see larger variability in response latency. A customer's usage is defined per model and is the total tokens consumed across all deployments in all subscriptions in all regions for a given tenant.
5157

58+
## Request increases to the default limits
59+
60+
Limit increase requests can be submitted and evaluated per request. [Open an online customer support request](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest/). When requesting for endpoint limit increase, provide the following information:
61+
62+
1. When opening the support request, select **Service and subscription limits (quotas)** as the **Issue type**.
63+
64+
1. Select the subscription of your choice.
65+
66+
1. Select **Cognitive Services** as **Quota type**.
67+
68+
1. Select **Next**.
69+
70+
1. On the **Additional details** tab, you need to provide detailed reasons for the limit increase in order for your request to be processed. Be sure to add the following information into the reason for limit increase:
71+
72+
* Model name, model version (if applicable), and deployment type (SKU).
73+
* Description of your scenario and workload.
74+
* Rationale for the requested increase.
75+
* Provide the target throughput: Tokens per minute, requests per minute, etc.
76+
* Provide planned time plan (by when you need increased limits).
77+
78+
1. Finally, select **Save and continue** to continue.
79+
5280
## General best practices to remain within rate limits
5381

5482
To minimize issues related to rate limits, it's a good idea to use the following techniques:
@@ -58,10 +86,6 @@ To minimize issues related to rate limits, it's a good idea to use the following
5886
- Test different load increase patterns.
5987
- Increase the quota assigned to your deployment. Move quota from another deployment, if necessary.
6088

61-
### Request increases to the default quotas and limits
62-
63-
Quota increase requests can be submitted and evaluated per request. [Submit a service request](../../ai-services/cognitive-services-support-options.md?context=/azure/ai-services/openai/context/context).
64-
6589
## Next steps
6690

67-
* Learn more about the [models available in the Azure AI model's inference service](./concepts/models.md)
91+
* Learn more about the [models available in the Azure AI model's inference service](./concepts/models.md)

0 commit comments

Comments
 (0)