You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-manage-environments-v2.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -105,7 +105,7 @@ Azure ML will start building the image from the build context when the environme
105
105
106
106
You can define an environment using a standard conda YAML configuration file that includes the dependencies for the conda environment. See [Creating an environment manually](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually) for information on this standard format.
107
107
108
-
You must also specify a base Docker image for this environment. Azure ML will build the conda environment on top of the Docker image provided. If you install some Python dependencies in your Docker image, those packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will build a Conda environment with dependencies you specified, and will execute the run in that environment instead of using any Python libraries that you installed on the base image.
108
+
You must also specify a base Docker image for this environment. Azure ML will build the conda environment on top of the Docker image provided. If you install some Python dependencies in your Docker image, those packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will build a Conda environment with dependencies you specified, and will execute the job in that environment instead of using any Python libraries that you installed on the base image.
109
109
110
110
The following example is a YAML specification file for an environment defined from a conda specification. Here the relative path to the conda file from the Azure ML environment YAML file is specified via the `conda_file` property. You can alternatively define the conda specification inline using the `conda_file` property, rather than defining it in a separate file.
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-manage-optimize-cost.md
+7-5Lines changed: 7 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,8 @@ ms.topic: how-to
11
11
ms.date: 06/08/2021
12
12
---
13
13
14
+
[//]: #(needs PM review; ParallelJobStep or ParallelRunStep?)
15
+
14
16
# Manage and optimize Azure Machine Learning costs
15
17
16
18
Learn how to manage and optimize costs when training and deploying machine learning models to Azure Machine Learning.
@@ -19,7 +21,7 @@ Use the following tips to help you manage and optimize your compute resource cos
19
21
20
22
- Configure your training clusters for autoscaling
21
23
- Set quotas on your subscription and workspaces
22
-
- Set termination policies on your training run
24
+
- Set termination policies on your training job
23
25
- Use low-priority virtual machines (VM)
24
26
- Schedule compute instances to shut down and start up automatically
25
27
- Use an Azure Reserved VM Instance
@@ -42,7 +44,7 @@ Because these compute pools are inside of Azure's IaaS infrastructure, you can d
42
44
43
45
Autoscaling clusters based on the requirements of your workload helps reduce your costs so you only use what you need.
44
46
45
-
AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each run completes, the cluster will release nodes and scale to your configured minimum node count.
47
+
AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each job completes, the cluster will release nodes and scale to your configured minimum node count.
@@ -62,14 +64,14 @@ Also configure [workspace level quota by VM family](how-to-manage-quotas.md#work
62
64
63
65
To set quotas at the workspace level, start in the [Azure portal](https://portal.azure.com). Select any workspace in your subscription, and select **Usages + quotas** in the left pane. Then select the **Configure quotas** tab to view the quotas. You need privileges at the subscription scope to set the quota, since it's a setting that affects multiple workspaces.
64
66
65
-
## Set run autotermination policies
67
+
## Set job autotermination policies
66
68
67
69
In some cases, you should configure your training runs to limit their duration or terminate them early. For example, when you are using Azure Machine Learning's built-in hyperparameter tuning or automated machine learning.
68
70
69
71
Here are a few options that you have:
70
72
* Define a parameter called `max_run_duration_seconds` in your RunConfiguration to control the maximum duration a run can extend to on the compute you choose (either local or remote cloud compute).
71
73
* For [hyperparameter tuning](how-to-tune-hyperparameters.md#early-termination), define an early termination policy from a Bandit policy, a Median stopping policy, or a Truncation selection policy. To further control hyperparameter sweeps, use parameters such as `max_total_runs` or `max_duration_minutes`.
72
-
* For [automated machine learning](how-to-configure-auto-train.md#exit), set similar termination policies using the `enable_early_stopping` flag. Also use properties such as `iteration_timeout_minutes` and `experiment_timeout_minutes` to control the maximum duration of a run or for the entire experiment.
74
+
* For [automated machine learning](how-to-configure-auto-train.md#exit), set similar termination policies using the `enable_early_stopping` flag. Also use properties such as `iteration_timeout_minutes` and `experiment_timeout_minutes` to control the maximum duration of a job or for the entire experiment.
73
75
74
76
## <aid="low-pri-vm"></a> Use low-priority VMs
75
77
@@ -91,7 +93,7 @@ Azure Machine Learning Compute supports reserved instances inherently. If you pu
91
93
92
94
## Train locally
93
95
94
-
When prototyping and running training jobs that are small enough to run on your local computer, consider training locally. Using the Python SDK, setting your compute target to `local` executes your script locally. For more information, see [Configure and submit training runs](how-to-set-up-training-targets.md#select-a-compute-target).
96
+
When prototyping and running training jobs that are small enough to run on your local computer, consider training locally. Using the Python SDK, setting your compute target to `local` executes your script locally. For more information, see [Configure and submit training jobs](how-to-set-up-training-targets.md#select-a-compute-target).
95
97
96
98
Visual Studio Code provides a full-featured environment for developing your machine learning applications. Using the Azure Machine Learning visual Visual Studio Code extension and Docker, you can run and debug locally. For more information, see [interactive debugging with Visual Studio Code](how-to-debug-visual-studio-code.md).
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-manage-quotas.md
+1-1Lines changed: 1 addition & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -91,7 +91,7 @@ The following table shows additional limits in the platform. Please reach out to
91
91
| Job lifetime on a low-priority node | 7 days<sup>2</sup> |
92
92
| Parameter servers per node | 1 |
93
93
94
-
<sup>1</sup> Maximum lifetime is the duration between when a run starts and when it finishes. Completed runs persist indefinitely. Data for runs not completed within the maximum lifetime is not accessible.
94
+
<sup>1</sup> Maximum lifetime is the duration between when a job starts and when it finishes. Completed jobs persist indefinitely. Data for jobs not completed within the maximum lifetime is not accessible.
95
95
96
96
<sup>2</sup> Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.
Copy file name to clipboardExpand all lines: articles/machine-learning/how-to-migrate-from-estimators-to-scriptrunconfig.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,10 +30,10 @@ This article covers common considerations when migrating from Estimators to Scri
30
30
Azure Machine Learning documentation and samples have been updated to use [ScriptRunConfig](/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig) for job configuration and submission.
31
31
32
32
For information on using ScriptRunConfig, refer to the following documentation:
33
-
*[Configure and submit training runs](how-to-set-up-training-targets.md)
34
-
*[Configuring PyTorch training runs](how-to-train-pytorch.md)
35
-
*[Configuring TensorFlow training runs](how-to-train-tensorflow.md)
36
-
*[Configuring scikit-learn training runs](how-to-train-scikit-learn.md)
33
+
*[Configure and submit training jobs](how-to-set-up-training-targets.md)
34
+
*[Configuring PyTorch training jobs](how-to-train-pytorch.md)
35
+
*[Configuring TensorFlow training jobs](how-to-train-tensorflow.md)
36
+
*[Configuring scikit-learn training jobs](how-to-train-scikit-learn.md)
37
37
38
38
In addition, refer to the following samples & tutorials:
| Features | List of features that will be analyzed for data drift over time. | Set to a model's output feature(s) to measure concept drift. Don't include features that naturally drift over time (month, year, index, etc.). You can backfill and existing data drift monitor after adjusting the list of features. | Yes |
244
244
| Compute target | Azure Machine Learning compute target to run the dataset monitor jobs. || Yes |
245
245
| Enable | Enable or disable the schedule on the dataset monitor pipeline | Disable the schedule to analyze historical data with the backfill setting. It can be enabled after the dataset monitor is created. | Yes |
246
-
| Frequency | The frequency that will be used to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each run compares data in the target dataset according to the frequency: <li>Daily: Compare most recent complete day in target dataset with baseline <li>Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline <li>Monthly: Compare most recent complete month in target dataset with baseline | No |
246
+
| Frequency | The frequency that will be used to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each job compares data in the target dataset according to the frequency: <li>Daily: Compare most recent complete day in target dataset with baseline <li>Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline <li>Monthly: Compare most recent complete month in target dataset with baseline | No |
247
247
| Latency | Time, in hours, it takes for data to arrive in the dataset. For instance, if it takes three days for data to arrive in the SQL DB the dataset encapsulates, set the latency to 72. | Cannot be changed after the dataset monitor is created | No |
248
248
| Email addresses | Email addresses for alerting based on breach of the data drift percentage threshold. | Emails are sent through Azure Monitor. | Yes |
249
249
| Threshold | Data drift percentage threshold for email alerting. | Further alerts and events can be set on many other metrics in the workspace's associated Application Insights resource. | Yes |
@@ -279,7 +279,7 @@ This section contains feature-level insights into the change in the selected fea
279
279
280
280
The target dataset is also profiled over time. The statistical distance between the baseline distribution of each feature is compared with the target dataset's over time. Conceptually, this is similar to the data drift magnitude. However this statistical distance is for an individual feature rather than all features. Min, max, and mean are also available.
281
281
282
-
In the Azure Machine Learning studio, click on a bar in the graph to see the feature-level details for that date. By default, you see the baseline dataset's distribution and the most recent run's distribution of the same feature.
282
+
In the Azure Machine Learning studio, click on a bar in the graph to see the feature-level details for that date. By default, you see the baseline dataset's distribution and the most recent job's distribution of the same feature.
283
283
284
284
:::image type="content" source="media/how-to-monitor-datasets/drift-by-feature.gif" alt-text="Drift magnitude by features":::
285
285
@@ -345,7 +345,7 @@ Limitations and known issues for data drift monitors:
345
345
* The time range when analyzing historical data is limited to 31 intervals of the monitor's frequency setting.
346
346
* Limitation of 200 features, unless a feature list is not specified (all features used).
347
347
* Compute size must be large enough to handle the data.
348
-
* Ensure your dataset has data within the start and end date for a given monitor run.
348
+
* Ensure your dataset has data within the start and end date for a given monitor job.
349
349
* Dataset monitors will only work on datasets that contain 50 rows or more.
350
350
* Columns, or features, in the dataset are classified as categorical or numeric based on the conditions in the following table. If the feature does not meet these conditions - for instance, a column of type string with >100 unique values - the feature is dropped from our data drift algorithm, but is still profiled.
351
351
@@ -357,8 +357,8 @@ Limitations and known issues for data drift monitors:
357
357
* When you have created a data drift monitor but cannot see data on the **Dataset monitors** page in Azure Machine Learning studio, try the following.
358
358
359
359
1. Check if you have selected the right date range at the top of the page.
360
-
1. On the **Dataset Monitors** tab, select the experiment link to check run status. This link is on the far right of the table.
361
-
1. If run completed successfully, check driver logs to see how many metrics has been generated or if there's any warning messages. Find driver logs in the **Output + logs** tab after you click on an experiment.
360
+
1. On the **Dataset Monitors** tab, select the experiment link to check job status. This link is on the far right of the table.
361
+
1. If the job completed successfully, check the driver logs to see how many metrics have been generated or if there's any warning messages. Find driver logs in the **Output + logs** tab after you click on an experiment.
362
362
363
363
* If the SDK `backfill()` function does not generate the expected output, it may be due to an authentication issue. When you create the compute to pass into this function, do not use `Run.get_context().experiment.workspace.compute_targets`. Instead, use [ServicePrincipalAuthentication](/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication) such as the following to create the compute that you pass into that `backfill()` function:
0 commit comments