Skip to content

Commit 14a239f

Browse files
author
Larry Franks
committed
date
1 parent 5a90e61 commit 14a239f

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

articles/machine-learning/how-to-manage-quotas.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.service: machine-learning
77
ms.subservice: core
88
author: SimranArora904
99
ms.author: siarora
10-
ms.date: 06/01/2022
10+
ms.date: 08/29/2022
1111
ms.topic: how-to
1212
ms.custom: troubleshooting, contperf-fy20q4, contperf-fy21q2, event-tier1-build-2022
1313
---
@@ -63,7 +63,7 @@ The following limits on assets apply on a per-workspace basis.
6363
In addition, the maximum **run time** is 30 days and the maximum number of **metrics logged per run** is 1 million.
6464

6565
### Azure Machine Learning Compute
66-
[Azure Machine Learning Compute](concept-compute-target.md#azure-machine-learning-compute-managed) has a default quota limit on both the number of cores (split by each VM Family and cumulative total cores) as well as the number of unique compute resources allowed per region in a subscription. This quota is separate from the VM core quota listed in the previous section as it applies only to the managed compute resources of Azure Machine Learning.
66+
[Azure Machine Learning Compute](concept-compute-target.md#azure-machine-learning-compute-managed) has a default quota limit on both the number of cores (split by each VM Family and cumulative total cores) and the number of unique compute resources allowed per region in a subscription. This quota is separate from the VM core quota listed in the previous section as it applies only to the managed compute resources of Azure Machine Learning.
6767

6868
[Request a quota increase](#request-quota-increases) to raise the limits for various VM family core quotas, total subscription core quotas, cluster quota and resources in this section.
6969

@@ -72,26 +72,26 @@ Available resources:
7272

7373
+ **Low-priority cores per region** have a default limit of 100 to 3,000, depending on your subscription offer type. The number of low-priority cores per subscription can be increased and is a single value across VM families.
7474

75-
+ **Clusters per region** have a default limit of 200. These are shared between training clusters, compute instances and MIR endpoint deployments. (A compute instance is considered a single-node cluster for quota purposes.) Cluster quota can be increased up to a value of 500 per region within a given subscription.
75+
+ **Clusters per region** have a default limit of 200. This limit is shared between training clusters, compute instances and MIR endpoint deployments. (A compute instance is considered a single-node cluster for quota purposes.) Cluster quota can be increased up to a value of 500 per region within a given subscription.
7676

7777
> [!TIP]
7878
> To learn more about which VM family to request a quota increase for, check out [virtual machine sizes in Azure](../virtual-machines/sizes.md). For instance GPU VM families start with an "N" in their family name (eg. NCv3 series)
7979
80-
The following table shows additional limits in the platform. Please reach out to the AzureML product team through a **technical** support ticket to request an exception.
80+
The following table shows more limits in the platform. Reach out to the AzureML product team through a **technical** support ticket to request an exception.
8181

8282
| **Resource or Action** | **Maximum limit** |
8383
| --- | --- |
8484
| Workspaces per resource group | 800 |
85-
| Nodes in a single Azure Machine Learning Compute (AmlCompute) **cluster** set up as a non communication-enabled pool (i.e. cannot run MPI jobs) | 100 nodes but configurable up to 65000 nodes |
86-
| Nodes in a single Parallel Run Step **run** on an Azure Machine Learning Compute (AmlCompute) cluster | 100 nodes but configurable up to 65000 nodes if your cluster is set up to scale per above |
85+
| Nodes in a single Azure Machine Learning Compute (AmlCompute) **cluster** set up as a non communication-enabled pool (that is, can't run MPI jobs) | 100 nodes but configurable up to 65,000 nodes |
86+
| Nodes in a single Parallel Run Step **run** on an Azure Machine Learning Compute (AmlCompute) cluster | 100 nodes but configurable up to 65,000 nodes if your cluster is set up to scale per above |
8787
| Nodes in a single Azure Machine Learning Compute (AmlCompute) **cluster** set up as a communication-enabled pool | 300 nodes but configurable up to 4000 nodes |
8888
| Nodes in a single Azure Machine Learning Compute (AmlCompute) **cluster** set up as a communication-enabled pool on an RDMA enabled VM Family | 100 nodes |
8989
| Nodes in a single MPI **run** on an Azure Machine Learning Compute (AmlCompute) cluster | 100 nodes but can be increased to 300 nodes |
9090
| Job lifetime | 21 days<sup>1</sup> |
9191
| Job lifetime on a low-priority node | 7 days<sup>2</sup> |
9292
| Parameter servers per node | 1 |
9393

94-
<sup>1</sup> Maximum lifetime is the duration between when a job starts and when it finishes. Completed jobs persist indefinitely. Data for jobs not completed within the maximum lifetime is not accessible.
94+
<sup>1</sup> Maximum lifetime is the duration between when a job starts and when it finishes. Completed jobs persist indefinitely. Data for jobs not completed within the maximum lifetime isn't accessible.
9595

9696
<sup>2</sup> Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.
9797

@@ -107,15 +107,15 @@ Azure Machine Learning managed online endpoints have limits described in the fol
107107
| Number of deployments per subscription | 200 |
108108
| Number of deployments per endpoint | 20 |
109109
| Number of instances per deployment | 20 <sup>2</sup> |
110-
| Max request time out at endpoint level | 90 seconds |
110+
| Max request time-out at endpoint level | 90 seconds |
111111
| Total requests per second at endpoint level for all deployments | 500 <sup>3</sup> |
112112
| Total connections per second at endpoint level for all deployments | 500 <sup>3</sup> |
113113
| Total connections active at endpoint level for all deployments | 500 <sup>3</sup> |
114114
| Total bandwidth at endpoint level for all deployments | 5 MBPS <sup>3</sup> |
115115

116116
<sup>1</sup> Single dashes like, `my-endpoint-name`, are accepted in endpoint and deployment names.
117117

118-
<sup>2</sup> We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you will receive an error.
118+
<sup>2</sup> We reserve 20% extra compute resources for performing upgrades. For example, if you request 10 instances in a deployment, you must have a quota for 12. Otherwise, you'll receive an error.
119119

120120
<sup>3</sup> If you request a limit increase, be sure to calculate related limit increases you might need. For example, if you request a limit increase for requests per second, you might also want to compute the required connections and bandwidth limits and include these limit increases in the same request.
121121

@@ -135,7 +135,7 @@ To request an exception from the Azure Machine Learning product team, use the st
135135
### Virtual machines
136136
Each Azure subscription has a limit on the number of virtual machines across all services. Virtual machine cores have a regional total limit and a regional limit per size series. Both limits are separately enforced.
137137

138-
For example, consider a subscription with a US East total VM core limit of 30, an A series core limit of 30, and a D series core limit of 30. This subscription would be allowed to deploy 30 A1 VMs, or 30 D1 VMs, or a combination of the two that does not exceed a total of 30 cores.
138+
For example, consider a subscription with a US East total VM core limit of 30, an A series core limit of 30, and a D series core limit of 30. This subscription would be allowed to deploy 30 A1 VMs, or 30 D1 VMs, or a combination of the two that doesn't exceed a total of 30 cores.
139139

140140
You can't raise limits for virtual machines above the values shown in the following table.
141141

@@ -176,7 +176,7 @@ You can't set a negative value or a value higher than the subscription-level quo
176176

177177
:::image type="content" source="media/how-to-manage-quotas/select-all-options.png" alt-text="Screenshot shows select all options to see compute resources that need more quota":::
178178

179-
1. Scroll down until you see the list of VM sizes you do not have quota for.
179+
1. Scroll down until you see the list of VM sizes you don't have quota for.
180180

181181
:::image type="content" source="media/how-to-manage-quotas/scroll-to-zero-quota.png" alt-text="Screenshot shows list of zero quota":::
182182

0 commit comments

Comments
 (0)