You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/ai-foundry/how-to/quota.md
+33-29Lines changed: 33 additions & 29 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,11 +8,11 @@ ms.custom:
8
8
- build-2024
9
9
- ignite-2024
10
10
ms.topic: how-to
11
-
ms.date: 05/06/2025
11
+
ms.date: 08/08/2025
12
12
ms.reviewer: siarora
13
13
ms.author: mopeakande
14
-
manager: nitinme
15
14
author: msakande
15
+
manager: nitinme
16
16
zone_pivot_groups: project-type
17
17
# Customer intent: As an Azure AI Foundry user, I want to know how to manage and increase quotas for resources with Azure AI Foundry.
18
18
---
@@ -21,7 +21,7 @@ zone_pivot_groups: project-type
21
21
22
22
::: zone pivot="hub-project"
23
23
24
-
Quota provides the flexibility to actively manage the allocation of rate limits across the deployments within your subscription. This article walks through the process of managing quota for your Azure AI Foundry virtual machines and Azure OpenAI in Foundry models.
24
+
Quota provides the flexibility to actively manage the allocation of rate limits across the deployments within your subscription. This article walks through the process of managing quota for your Azure AI Foundry virtual machines and Azure AI Foundry Models.
25
25
26
26
Azure uses limits and quotas to prevent budget overruns due to fraud, and to honor Azure capacity constraints. It's also a good way to control costs for admins. Consider these limits as you scale for production workloads.
27
27
@@ -36,7 +36,7 @@ In this article, you learn about:
36
36
37
37
::: zone pivot="fdp-project"
38
38
39
-
Quota provides the flexibility to actively manage the allocation of rate limits across the deployments within your subscription. This article walks through the process of managing quota for your Azure OpenAI in Foundry models.
39
+
Quota provides the flexibility to actively manage the allocation of rate limits across the deployments within your subscription. This article walks through the process of managing quota for your Azure AI Foundry Models.
40
40
41
41
Azure uses limits and quotas to prevent budget overruns due to fraud, and to honor Azure capacity constraints. It's also a good way to control costs for admins. Consider these limits as you scale for production workloads.
42
42
@@ -79,29 +79,31 @@ To raise the limits for compute, you can [request a quota increase](#view-and-
79
79
80
80
Available resources include:
81
81
- Dedicated cores per region have a default limit of 24 to 300, depending on your subscription offer type. You can increase the number of dedicated cores per subscription for each VM family. Specialized VM families like NCv2, NCv3, or ND series start with a default of zero cores. GPUs also default to zero cores.
82
-
- Total compute limit per region has a default limit of 500 per region within a given subscription. The limit can be increased up to a maximum value of 2500 per region. This limit is shared between compute instances, and managed online endpoint deployments. A compute instance is considered a single-node cluster for quota purposes. In order to increase the total compute limit, [open an online customer support request](https://portal.azure.com/#view/Microsoft_Azure_Support/NewSupportRequestV3Blade/callerWorkflowId/5088c408-f627-4398-9aa3-c41cdd93a6eb/callerName/Microsoft_Azure_Support%2FHelpAndSupportOverview.ReactView).
82
+
- Total compute limit per region has a default limit of 500 per region within a given subscription. The limit can be increased up to a maximum value of 2500 per region. This limit is shared between compute instances, and managed online endpoint deployments. A compute instance is considered a single-node cluster for quota purposes. In order to increase the total compute limit, [open an online customer support request](https://ms.portal.azure.com/#view/Microsoft_Azure_Support/NewSupportRequestV4Blade).
83
83
84
84
When opening the support request to increase the total compute limit, provide the following information:
85
85
1. Select **Technical** for the issue type.
86
86
1. Select the subscription that you want to increase the quota for.
87
87
1. Select **Machine Learning** as the service type.
88
88
1. Select the resource that you want to increase the quota for.
89
89
1. In the **Summary** field, enter "Increase total compute limits"
90
-
1. Select **Compute instance** the problem type and **Quota** as the problem subtype.
90
+
1. Select **Compute instance**as the problem type and **Quota** as the problem subtype.
91
91
92
92
:::image type="content" source="../media/cost-management/quota-azure-portal-support.png" alt-text="Screenshot of the page to submit compute quota requests in Azure portal." lightbox="../media/cost-management/quota-azure-portal-support.png":::
93
93
94
-
1. Select **Next**.
95
-
1. On the **Additional details** page, provide the subscription ID, region, new limit (between 500 and 2500), and business justification to increase the total compute limits for the region.
96
-
1. Select **Create** to submit the support request ticket.
94
+
1. Select **Next** to see the **Recommended solution** page.
95
+
1. After viewing the recommended solution, select **Return to support request**.
96
+
1. Select **Next** to go to the **Additional details** page and provide the required information to help the support team resolve your issue.
97
+
1. Select **Next** to review the support request ticket.
98
+
1. Select **Create** to submit the ticket.
97
99
98
100
::: zone-end
99
101
100
102
## Azure AI Foundry shared quota
101
103
102
-
Azure AI Foundry provides a pool of shared quota that is available for different users across various regions to use concurrently. Depending upon availability, users can temporarily access quota from the shared pool, and use the quota to perform testing for a limited amount of time. The specific time duration depends on the use case. By temporarily using quota from the quota pool, you no longer need to file a support ticket for a short-term quota increase or wait for your quota request to be approved before you can proceed with your workload.
104
+
Azure AI Foundry provides a pool of shared quota that is available for different users across various regions to use concurrently. Depending upon availability, users can temporarily access quota from the shared pool and use the quota to perform testing for a limited amount of time. The specific time duration depends on the use case. By temporarily using quota from the quota pool, you no longer need to file a support ticket for a short-term quota increase or wait for your quota request to be approved before you can proceed with your workload.
103
105
104
-
Use of the shared quota pool is available for testing inferencing for Llama-2, Phi, Nemotron, Mistral, Dolly, and Deci-DeciLM models from the Model Catalog. You should use the shared quota only for creating temporary test endpoints, not production endpoints. For endpoints in production, you should [request dedicated quota](#view-and-request-quotas-in-azure-ai-foundry-portal). Billing for shared quota is usage-based.
106
+
Use of the shared quota pool is available for testing inferencing for Foundry Models from the model catalog. You should use the shared quota only for creating temporary test endpoints, not production endpoints. For endpoints in production, you should [request dedicated quota](#view-and-request-quotas-in-azure-ai-foundry-portal). Billing for shared quota is usage-based.
105
107
106
108
::: zone pivot="hub-project"
107
109
@@ -136,24 +138,25 @@ Use quotas to manage model quota allocation between multiple [!INCLUDE [fdp](../
136
138
137
139
:::image type="content" source="../media/management-center/management-center.png" alt-text="Screenshot of the management center link.":::
138
140
139
-
1. Select **Quota** from the left menu.
141
+
1. Select **Quota** from the left menu to open the quota view, where you can see the quota for the models in specific Azure regions.
140
142
141
-
:::image type="content" source="../media/cost-management/quotas.png" alt-text="Screenshot of the Model and VM quota entries in the management section." lightbox="../media/cost-management/quotas.png":::
143
+
:::image type="content" source="../media/cost-management/quotas.png" alt-text="Screenshot of the quota entry in the management center section." lightbox="../media/cost-management/quotas.png":::
142
144
143
-
1. From the quota view, you can see the quota for the models in the selected Azure region. To request quota, select the model and then select **Request quota**.
144
-
145
-
:::image type="content" source="../media/cost-management/model-quota.png" alt-text="Screenshot of the Model quota page in Azure AI Foundry portal." lightbox="../media/cost-management/model-quota.png":::
145
+
1. To request quota from the quota view, expand any of the groupings listed in the deployment column until you see the model deployments and their associated information.
146
146
147
+
:::image type="content" source="../media/cost-management/model-quota.png" alt-text="Screenshot of the Model quota page in Azure AI Foundry portal, with one of the groupings expanded." lightbox="../media/cost-management/model-quota.png":::
148
+
147
149
- Use the **Show all quota** toggle to display all quota or only the currently allocated quota.
148
-
- Use the **Group by** dropdown to group the list by **Quota type, Region & Model**, **Quota type, Model & Region**, or **None**. The **None** grouping displays a list of model deployments.
149
-
- Expand the groupings to view information on specific model deployments. While viewing a model deployment, select the **pencil icon** in the **Quota allocation** column to edit the quota allocation for the model deployment.
150
+
- Use the **Group by** dropdown to group the list by **Quota type, Region & Model**, or **Quota type, Model & Region**, or **None**. The **None** option displays a flat list of model deployments, rather than a nested list.
151
+
- On the line entry for a given model deployment, select the **pencil icon** in the **Quota allocation** column to edit the quota allocation for the model deployment.
152
+
- Select **Request quota** in the **Request quota** column to request increases in quota for the standard deployment type.
150
153
- Use the **charts** along the side of the page to view more details about quota usage. The charts are interactive; hovering over a section of the chart displays more information, and selecting the chart filters the list of models. Selecting the chart legend filters the data displayed in the chart.
151
-
- Use the **Azure OpenAI Provisioned** link to view information about provisioned models, including a **Capacity calculator**.
154
+
- Use the **Provisioned Throughput** link to view information about provisioned models, including a **Capacity calculator** that you can use to estimate the number of PTUs needed for your workload.
152
155
153
-
1. When you select the **VM quota** link, you can view the quota and usage for the virtual machine families in the selected Azure region. To request quota, select the VM family and then select **Request quota**.
156
+
1. When you select the **VM Quota** link, you can view the quota and usage for the virtual machine families in the selected Azure region. To request quota, select the VM family and then select **Request quota**.
154
157
155
158
> [!TIP]
156
-
> If you don't see the **VM quota** link, you were viewing a [!INCLUDE [fdp](../includes/fdp-project-name.md)] project when you selected **Management center**. Use the **All resources** link and then select a project where the **Type** contains **Parent resource : name (Hub)**, and then select **Quota** from the left menu.
159
+
> If you don't see the **VM quota** link, you were viewing a [!INCLUDE [fdp](../includes/fdp-project-name.md)] project when you selected **Management center**. Use the **All resources** link and then select a project where the **Type** contains **Parent resource : name (Hub)**, and then select **Management center** then **Quota** from the left menu.
157
160
158
161
:::image type="content" source="../media/cost-management/vm-quota.png" alt-text="Screenshot of the VM quota page in Azure AI Foundry portal." lightbox="../media/cost-management/vm-quota.png":::
159
162
@@ -165,19 +168,20 @@ Use quotas to manage model quota allocation between multiple [!INCLUDE [fdp](../
165
168
166
169
:::image type="content" source="../media/management-center/management-center.png" alt-text="Screenshot of the management center link.":::
167
170
168
-
1. Select **Quota** from the left menu.
169
-
170
-
:::image type="content" source="../media/cost-management/quotas.png" alt-text="Screenshot of the Model and VM quota entries in the management section." lightbox="../media/cost-management/quotas.png":::
171
+
1. Select **Quota** from the left menu to open the quota view, where you can see the quota for the models in specific Azure regions.
171
172
172
-
1. From the quota view, you can see the quota for the models in the selected Azure region. To request quota, select the model and then select **Request quota**.
173
+
:::image type="content" source="../media/cost-management/quotas.png" alt-text="Screenshot of the quota entry in the management center section." lightbox="../media/cost-management/quotas.png":::
173
174
174
-
:::image type="content" source="../media/cost-management/project-model-quota.png" alt-text="Screenshot of the Model quota page for a Foundry project in Azure AI Foundry portal." lightbox="../media/cost-management/project-model-quota.png":::
175
+
1. To request quota from the quota view, expand any of the groupings listed in the deployment column until you see the model deployments and their associated information.
176
+
177
+
:::image type="content" source="../media/cost-management/project-model-quota.png" alt-text="Screenshot of the Model quota page for a Foundry project in Azure AI Foundry portal, with one of the groupings expanded." lightbox="../media/cost-management/project-model-quota.png":::
175
178
176
179
- Use the **Show all quota** toggle to display all quota or only the currently allocated quota.
177
-
- Use the **Group by** dropdown to group the list by **Quota type, Region & Model**, **Quota type, Model & Region**, or **None**. The **None** grouping displays a list of model deployments.
178
-
- Expand the groupings to view information on specific model deployments. While viewing a model deployment, select the **pencil icon** in the **Quota allocation** column to edit the quota allocation for the model deployment.
180
+
- Use the **Group by** dropdown to group the list by **Quota type, Region & Model**, or **Quota type, Model & Region**, or **None**. The **None** option displays a flat list of model deployments, rather than a nested list.
181
+
- On the line entry for a given model deployment, select the **pencil icon** in the **Quota allocation** column to edit the quota allocation for the model deployment.
182
+
- Select **Request quota** in the **Request quota** column to request increases in quota for the standard deployment type.
179
183
- Use the **charts** along the side of the page to view more details about quota usage. The charts are interactive; hovering over a section of the chart displays more information, and selecting the chart filters the list of models. Selecting the chart legend filters the data displayed in the chart.
180
-
- Use the **Azure OpenAI Provisioned** link to view information about provisioned models, including a **Capacity calculator**.
184
+
- Use the **Provisioned Throughput** link to view information about provisioned models, including a **Capacity calculator** that you can use to estimate the number of PTUs needed for your workload.
0 commit comments