Skip to content

Commit a1b1d82

Browse files
authored
Merge pull request #114537 from Blackmist/min-nodes
note/include and grammar
2 parents f146930 + a315354 commit a1b1d82

File tree

3 files changed

+38
-24
lines changed

3 files changed

+38
-24
lines changed

articles/machine-learning/concept-plan-manage-cost.md

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ The following screenshot shows the cost estimation by using the calculator:
4444

4545
As you add new resources to your workspace, return to this calculator and add the same resource here to update your cost estimates.
4646

47-
While the Enterprise edition is in preview, there is no ML surcharge. When the Enterprise edition becomes generally available, it will have a machine learning surcharge (for training and inferencing). For more details, [Azure Machine Learning pricing](https://azure.microsoft.com/pricing/details/machine-learning/).
47+
While the Enterprise edition is in preview, there is no ML surcharge. When the Enterprise edition becomes generally available, it will have a surcharge (for training and inferencing). For more information, see [Azure Machine Learning pricing](https://azure.microsoft.com/pricing/details/machine-learning/).
4848

4949
## Get cost alerts
5050

@@ -54,27 +54,30 @@ Create [budgets](../cost-management/tutorial-acm-create-budgets.md) to manage co
5454

5555
As you use resources with Azure Machine Learning, you incur costs. Azure resource usage unit costs vary by time intervals (seconds, minutes, hours, and days) or by request unit usage. As soon as usage of Azure Machine Learning starts, costs are incurred. View these costs in the [cost analysis](../cost-management/quick-acm-cost-analysis.md) pane in the Azure portal.
5656

57-
View costs in graphs and tables for different time intervals. Some examples are by day, current, prior month, and year. Also view costs against budgets and forecasted costs. Switching to longer views over time helps you identify spending trends and see where overspending might have occurred. If you've created budgets, see where they exceeded.
57+
You can view costs in graphs and tables for different time intervals. You can also view costs against budgets and forecasted costs. Switching to longer views over time helps identify spending trends and see where overspending might have occurred. If you've created budgets, see where they exceeded.
5858

5959
You won't see a separate service area for Machine Learning. Instead you'll see the various resources you've added to your Machine Learning workspaces.
6060

6161
## Use Azure Machine Learning compute cluster (AmlCompute)
6262

6363
With constantly changing data, you need fast and streamlined model training and retraining to maintain accurate models. However, continuous training comes at a cost, especially for deep learning models on GPUs.
6464

65-
Azure Machine Learning users can use the managed Azure Machine Learning compute cluster, also called AmlCompute. AmlCompute supports a variety of GPU and CPU options. The AmlCompute is internally hosted on behalf of your subscription by Azure Machine Learning, but provides the same enterprise grade security, compliance and governance at Azure IaaS cloud scale.
65+
Azure Machine Learning users can use the managed Azure Machine Learning compute cluster, also called AmlCompute. AmlCompute supports a variety of GPU and CPU options. The AmlCompute is internally hosted on behalf of your subscription by Azure Machine Learning. It provides the same enterprise grade security, compliance and governance at Azure IaaS cloud scale.
6666

6767
Because these compute pools are inside of Azure's IaaS infrastructure, you can deploy, scale, and manage your training with the same security and compliance requirements as the rest of your infrastructure. These deployments occur in your subscription and obey your governance rules. Learn more about [Azure Machine Learning Compute](how-to-set-up-training-targets.md#amlcompute).
6868

6969
## Configure training clusters for autoscaling
7070

71-
Autoscaling clusters based on the requirements of your workload helps reduce your costs so you only use what you need.
72-
AmlCompute clusters are designed to autoscale dynamically based on the requirements of your workload. The cluster can be scaled up to the maximum number of nodes provisioned and within the quota designated for the subscription. As each run completes, the cluster will release nodes and autoscale to your designated minimum node count.
71+
Autoscaling clusters based on the requirements of your workload helps reduce your costs so you only use what you need.
7372

74-
In addition to setting the minimum and maximum number of nodes, tweak the amount of time the node is idle before scale down. By default, idle time before scale down is set to 120 seconds.
73+
AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each run completes, the cluster will release nodes and scale to your configured minimum node count.
74+
75+
[!INCLUDE [min-nodes-note](../../includes/machine-learning-min-nodes.md)]
76+
77+
You can also configure the amount of time the node is idle before scale down. By default, idle time before scale down is set to 120 seconds.
7578

7679
+ If you perform less iterative experimentation, reduce this time to save costs.
77-
+ If you perform highly iterative dev/test experimentation, you might need to increase this so you aren't paying for constant scaling up and down after each change to your training script or environment.
80+
+ If you perform highly iterative dev/test experimentation, you might need to increase the time so you aren't paying for constant scaling up and down after each change to your training script or environment.
7881

7982
AmlCompute clusters can be configured for your changing workload requirements in Azure portal, using the [AmlCompute SDK class](https://docs.microsoft.com/python/api/azureml-core/azureml.core.compute.amlcompute.amlcompute?view=azure-ml-py), [AmlCompute CLI](https://docs.microsoft.com/cli/azure/ext/azure-cli-ml/ml/computetarget/create?view=azure-cli-latest#ext-azure-cli-ml-az-ml-computetarget-create-amlcompute), with the [REST APIs](https://github.com/Azure/azure-rest-api-specs/tree/master/specification/machinelearningservices/resource-manager/Microsoft.MachineLearningServices/stable).
8083

@@ -84,24 +87,24 @@ az ml computetarget create amlcompute --name testcluster --vm-size Standard_NC6
8487

8588
## Set quotas on resources
8689

87-
Much like other Azure compute resources, AmlCompute comes with an inherent [quota (or limit) configuration](how-to-manage-quotas.md#azure-machine-learning-compute). This quota is by VM family (for example, Dv2 series, NCv3 series) and varies by region for each subscription. Subscriptions start with small defaults to get you going, but use this setting to control the amount of Amlcompute resources available to be spun up in your subscription.
90+
AmlCompute comes with a [quota (or limit) configuration](how-to-manage-quotas.md#azure-machine-learning-compute). This quota is by VM family (for example, Dv2 series, NCv3 series) and varies by region for each subscription. Subscriptions start with small defaults to get you going, but use this setting to control the amount of Amlcompute resources available to be spun up in your subscription.
8891

89-
Also configure [workspace level quota by VM family](how-to-manage-quotas.md#workspace-level-quota), for each workspace within a subscription. This allows you to have more granular control on the costs that each workspace might potentially incur and restrict certain VM families.
92+
Also configure [workspace level quota by VM family](how-to-manage-quotas.md#workspace-level-quota), for each workspace within a subscription. Doing so allows you to have more granular control on the costs that each workspace might potentially incur and restrict certain VM families.
9093

91-
To set quotas at the workspace level, start in the [Azure portal](https://portal.azure.com). Select any workspace in your subscription, and select **Usages + quotas** in the left pane. Then select the **Configure quotas** tab to view the quotas. You need privileges at the subscription scope to set this quota, since it's a setting that affects multiple workspaces.
94+
To set quotas at the workspace level, start in the [Azure portal](https://portal.azure.com). Select any workspace in your subscription, and select **Usages + quotas** in the left pane. Then select the **Configure quotas** tab to view the quotas. You need privileges at the subscription scope to set the quota, since it's a setting that affects multiple workspaces.
9295

93-
## Set run auto-termination policies
96+
## Set run autotermination policies
9497

95-
Configure your training runs to limit their duration or to terminate them early in case of certain conditions especially when you are using Azure Machine Learning's built-in Hyperparameter Tuning or Automated Machine Learning capabilities.
98+
In some cases, you should configure your training runs to limit their duration or terminate them early. For example, when you are using Azure Machine Learning's built-in hyperparameter tuning or automated machine learning.
9699

97100
Here are a few options that you have:
98101
* Define a parameter called `max_run_duration_seconds` in your RunConfiguration to control the maximum duration a run can extend to on the compute you choose (either local or remote cloud compute).
99-
* For [hyperparameter tuning](how-to-tune-hyperparameters.md#early-termination), define an early termination policy from a Bandit policy, a Median stopping policy or a Truncation selection policy. In addition, also use parameters such as `max_total_runs` or `max_duration_minutes` to further control the various hyperparameter sweeps.
102+
* For [hyperparameter tuning](how-to-tune-hyperparameters.md#early-termination), define an early termination policy from a Bandit policy, a Median stopping policy, or a Truncation selection policy. To further control hyperparameter sweeps, use parameters such as `max_total_runs` or `max_duration_minutes`.
100103
* For [automated machine learning](how-to-configure-auto-train.md#exit), set similar termination policies using the `enable_early_stopping` flag. Also use properties such as `iteration_timeout_minutes` and `experiment_timeout_minutes` to control the maximum duration of a run or for the entire experiment.
101104

102105
## Use low-priority VMs
103106

104-
Azure allows you to use excess unutilized capacity as Low-Priority VMs across virtual machine scale sets, Batch, and the Machine Learning service. These allocations are pre-emptible but come at a reduced price compared to dedicated VMs. In general, we recommend using Low-Priority VMs for Batch workloads or where interruptions are recoverable either through resubmits (for Batch Inferencing) or through restarts (for deep learning training with checkpointing).
107+
Azure allows you to use excess unutilized capacity as Low-Priority VMs across virtual machine scale sets, Batch, and the Machine Learning service. These allocations are pre-emptible but come at a reduced price compared to dedicated VMs. In general, we recommend using Low-Priority VMs for Batch workloads. You should also use them where interruptions are recoverable either through resubmits (for Batch Inferencing) or through restarts (for deep learning training with checkpointing).
105108

106109
Low-Priority VMs have a single quota separate from the dedicated quota value, which is by VM family. Learn [more about AmlCompute quotas](how-to-manage-quotas.md).
107110

@@ -127,9 +130,9 @@ Set the priority of your VM in any of these ways:
127130

128131
## Use reserved instances
129132

130-
Azure Reserved VM Instance provides another way to get huge savings on compute resources by committing to one-year or three-year terms. These discounts range up to 72% of the pay-as-you-go prices and are applied directly to your monthly Azure bill.
133+
Another way to save money on compute resources is Azure Reserved VM Instance. With this offering, you commit to one-year or three-year terms. These discounts range up to 72% of the pay-as-you-go prices and are applied directly to your monthly Azure bill.
131134

132-
Azure Machine Learning Compute supports reserved instances inherently. So if you have purchased a one-year or three-year reserved instance we will automatically apply that reserved instance discount against the managed compute that is used within Azure Machine Learning without requiring any additional setup from your end.
135+
Azure Machine Learning Compute supports reserved instances inherently. If you purchase a one-year or three-year reserved instance, we will automatically apply discount against your Azure Machine Learning managed compute.
133136

134137

135138
## Next steps

articles/machine-learning/how-to-set-up-training-targets.md

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -15,9 +15,9 @@ ms.custom: seodec18
1515
# Set up and use compute targets for model training
1616
[!INCLUDE [applies-to-skus](../../includes/aml-applies-to-basic-enterprise-sku.md)]
1717

18-
With Azure Machine Learning, you can train your model on a variety of resources or environments, collectively referred to as [__compute targets__](concept-azure-machine-learning-architecture.md#compute-targets). A compute target can be a local machine or a cloud resource, such as an Azure Machine Learning Compute, Azure HDInsight or a remote virtual machine. You can also create compute targets for model deployment as described in ["Where and how to deploy your models"](how-to-deploy-and-where.md).
18+
With Azure Machine Learning, you can train your model on a variety of resources or environments, collectively referred to as [__compute targets__](concept-azure-machine-learning-architecture.md#compute-targets). A compute target can be a local machine or a cloud resource, such as an Azure Machine Learning Compute, Azure HDInsight, or a remote virtual machine. You can also create compute targets for model deployment as described in ["Where and how to deploy your models"](how-to-deploy-and-where.md).
1919

20-
You can create and manage a compute target using the Azure Machine Learning SDK, Azure Machine Learning studio, Azure CLI or Azure Machine Learning VS Code extension. If you have compute targets that were created through another service (for example, an HDInsight cluster), you can use them by attaching them to your Azure Machine Learning workspace.
20+
You can create and manage a compute target using the Azure Machine Learning SDK, Azure Machine Learning studio, Azure CLI, or Azure Machine Learning VS Code extension. If you have compute targets that were created through another service (for example, an HDInsight cluster), you can use them by attaching them to your Azure Machine Learning workspace.
2121

2222
In this article, you learn how to use various compute targets for model training. The steps for all compute targets follow the same workflow:
2323
1. __Create__ a compute target if you don't already have one.
@@ -30,7 +30,7 @@ In this article, you learn how to use various compute targets for model training
3030
3131
## Compute targets for training
3232

33-
Azure Machine Learning has varying support across different compute targets. A typical model development lifecycle starts with dev/experimentation on a small amount of data. At this stage, we recommend using a local environment. For example, your local computer or a cloud-based VM. As you scale up your training on larger data sets, or do distributed training, we recommend using Azure Machine Learning Compute to create a single- or multi-node cluster that autoscales each time you submit a run. You can also attach your own compute resource, although support for various scenarios may vary as detailed below:
33+
Azure Machine Learning has varying support across different compute targets. A typical model development lifecycle starts with dev/experimentation on a small amount of data. At this stage, we recommend using a local environment. For example, your local computer or a cloud-based VM. As you scale up your training on larger data sets, or perform distributed training, we recommend using Azure Machine Learning Compute to create a single- or multi-node cluster that autoscales each time you submit a run. You can also attach your own compute resource, although support for various scenarios may vary as detailed below:
3434

3535
[!INCLUDE [aml-compute-target-train](../../includes/aml-compute-target-train.md)]
3636

@@ -58,7 +58,7 @@ For more information, see [Train ML Models with estimators](how-to-train-ml-mode
5858

5959
With ML pipelines, you can optimize your workflow with simplicity, speed, portability, and reuse. When building pipelines with Azure Machine Learning, you can focus on your expertise, machine learning, rather than on infrastructure and automation.
6060

61-
ML pipelines are constructed from multiple **steps**, which are distinct computational units in the pipeline. Each step can run independently and use isolated compute resources. This allows multiple data scientists to work on the same pipeline at the same time without over-taxing compute resources, and also makes it easy to use different compute types/sizes for each step.
61+
ML pipelines are constructed from multiple **steps**, which are distinct computational units in the pipeline. Each step can run independently and use isolated compute resources. This approach allows multiple data scientists to work on the same pipeline at the same time without over-taxing compute resources, and also makes it easy to use different compute types/sizes for each step.
6262

6363
> [!TIP]
6464
> ML Pipelines can use run configuration or estimators when training models.
@@ -94,10 +94,11 @@ You can use Azure Machine Learning Compute to distribute the training process ac
9494
Azure Machine Learning Compute has default limits, such as the number of cores that can be allocated. For more information, see [Manage and request quotas for Azure resources](https://docs.microsoft.com/azure/machine-learning/how-to-manage-quotas).
9595

9696
> [!TIP]
97-
> Clusters can generally scale upto 100 nodes as long as you have enough quota for the number of cores required. By default clusters are setup with inter-node communication enabled between the nodes of the cluster to support MPI jobs for example. However you can scale your clusters to 1000s of nodes by simply [raising a support ticket](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest), and requesting to whitelist your subscription, or workspace, or a specific cluster for disabling inter-node communication.
98-
>
97+
> Clusters can generally scale up to 100 nodes as long as you have enough quota for the number of cores required. By default clusters are setup with inter-node communication enabled between the nodes of the cluster to support MPI jobs for example. However you can scale your clusters to 1000s of nodes by simply [raising a support ticket](https://portal.azure.com/#blade/Microsoft_Azure_Support/HelpAndSupportBlade/newsupportrequest), and requesting to whitelist your subscription, or workspace, or a specific cluster for disabling inter-node communication.
98+
99+
Azure Machine Learning Compute can be reused across runs. The compute can be shared with other users in the workspace and is retained between runs, automatically scaling nodes up or down based on the number of runs submitted, and the max_nodes set on your cluster. The min_nodes setting controls the minimum nodes available.
99100

100-
Azure Machine Learning Compute can be reused across runs. The compute can be shared with other users in the workspace and is retained between runs, automatically scaling nodes up or down based on the number of runs submitted, and the max_nodes set on your cluster.
101+
[!INCLUDE [min-nodes-note](../../includes/machine-learning-min-nodes.md)]
101102

102103
1. **Create and attach**: To create a persistent Azure Machine Learning Compute resource in Python, specify the **vm_size** and **max_nodes** properties. Azure Machine Learning then uses smart defaults for the other properties. The compute autoscales down to zero nodes when it isn't used. Dedicated VMs are created to run your jobs as needed.
103104

@@ -478,7 +479,7 @@ az ml run submit-hyperdrive -e <experiment> -c <runconfig> --hyperdrive-configur
478479
479480
Note the *arguments* section in runconfig and *parameter space* in HyperDrive config. They contain the command-line arguments to be passed to training script. The value in runconfig stays the same for each iteration, while the range in HyperDrive config is iterated over. Do not specify the same argument in both files.
480481
481-
For more details on these ```az ml``` CLI commands and full set of arguments, see
482+
For more details on these ```az ml``` CLI commands, see
482483
[the reference documentation](reference-azure-machine-learning-cli.md).
483484
484485
<a id="gitintegration"></a>

0 commit comments

Comments
 (0)