Skip to content

Commit afdc1fa

Browse files
authored
Merge pull request #251671 from Blackmist/quota-update-0915
Quota update 0915
2 parents 28a8eb4 + 31214ec commit afdc1fa

File tree

3 files changed

+19
-3
lines changed

3 files changed

+19
-3
lines changed

articles/machine-learning/how-to-manage-quotas.md

Lines changed: 19 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: enterprise-readiness
88
author: SimranArora904
99
ms.author: siarora
1010
ms.reviewer: larryfr
11-
ms.date: 07/25/2023
11+
ms.date: 09/15/2023
1212
ms.topic: how-to
1313
ms.custom: troubleshooting, contperf-fy20q4, contperf-fy21q2, event-tier1-build-2022
1414
---
@@ -86,7 +86,23 @@ Available resources:
8686

8787
+ **Low-priority cores per region** have a default limit of 100 to 3,000, depending on your subscription offer type. The number of low-priority cores per subscription can be increased and is a single value across VM families.
8888

89-
+ **Clusters per region** have a default limit of 200 and it can be increased up to a value of 500 per region within a given subscription. This limit is shared between training clusters, compute instances and managed online endpoint deployments. A compute instance is considered a single-node cluster for quota purposes. Starting 1 September 2023, cluster quota limits will automatically be increased from 200 to 500 on your behalf when usage is approaching close to the 200 default limit, eliminating the need to file for a support ticket.
89+
+ **Total compute limit per region** has a default limit of 500 per region within a given subscription and can be increased up to a maximum value of 2500 per region. This limit is shared between training clusters, compute instances, and managed online endpoint deployments. A compute instance is considered a single-node cluster for quota purposes. In order to increase the total compute limit, [open an online customer support request](https://ms.portal.azure.com/#view/Microsoft_Azure_Support/NewSupportRequestV3Blade/callerWorkflowId/5088c408-f627-4398-9aa3-c41cdd93a6eb/callerName/Microsoft_Azure_Support%2FHelpAndSupportOverview.ReactView). Provide the following information:
90+
91+
1. When opening the support request, select __Technical__ as the __Issue type__.
92+
1. Select the subscription of your choice
93+
1. Select __Machine Learning__ as the __Service__.
94+
1. Select the resource of your choice
95+
1. In the summary, mention "Increase total compute limits"
96+
1. Select __Compute Cluster__ as the __Problem type__ and __Cluster does not scale up or is stuck in resizing__ as the __Problem subtype__.
97+
98+
:::image type="content" source="media/how-to-manage-quotas/problem-description.png" alt-text="Screenshot of the problem description tab.":::
99+
100+
1. On the __Additional details__ tab, provide the subscription ID, region, new limit (between 500 and 2500) and business justification if you would like to increase the total compute limits in this region.
101+
102+
:::image type="content" source="media/how-to-manage-quotas/additional-details.png" alt-text="Screenshot of the additional details tab.":::
103+
104+
1. Finally, select __Create__ to create a support request ticket.
105+
90106

91107
The following table shows more limits in the platform. Reach out to the Azure Machine Learning product team through a **technical** support ticket to request an exception.
92108

@@ -95,7 +111,7 @@ The following table shows more limits in the platform. Reach out to the Azure Ma
95111
| Workspaces per resource group | 800 |
96112
| Nodes in a single Azure Machine Learning compute (AmlCompute) **cluster** set up as a non communication-enabled pool (that is, can't run MPI jobs) | 100 nodes but configurable up to 65,000 nodes |
97113
| Nodes in a single Parallel Run Step **run** on an Azure Machine Learning compute (AmlCompute) cluster | 100 nodes but configurable up to 65,000 nodes if your cluster is set up to scale as mentioned previously |
98-
| Nodes in a single Azure Machine Learning compute (AmlCompute) **cluster** set up as a communication-enabled pool | 300 nodes but configurable up to 4000 nodes |
114+
| Nodes in a single Azure Machine Learning compute (AmlCompute) **cluster** set up as a communication-enabled pool | 300 nodes but configurable up to 4,000 nodes |
99115
| Nodes in a single Azure Machine Learning compute (AmlCompute) **cluster** set up as a communication-enabled pool on an RDMA enabled VM Family | 100 nodes |
100116
| Nodes in a single MPI **run** on an Azure Machine Learning compute (AmlCompute) cluster | 100 nodes but can be increased to 300 nodes |
101117
| Job lifetime | 21 days<sup>1</sup> |
46.1 KB
Loading
43.7 KB
Loading

0 commit comments

Comments
 (0)