Skip to content

Commit ad08ea8

Browse files
authored
Merge pull request #199430 from ssalgadodev/runsToJobsPart6
Runs to jobs
2 parents 14ea63a + dec81a9 commit ad08ea8

8 files changed

+56
-52
lines changed

articles/machine-learning/how-to-manage-environments-v2.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@ Azure ML will start building the image from the build context when the environme
105105

106106
You can define an environment using a standard conda YAML configuration file that includes the dependencies for the conda environment. See [Creating an environment manually](https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually) for information on this standard format.
107107

108-
You must also specify a base Docker image for this environment. Azure ML will build the conda environment on top of the Docker image provided. If you install some Python dependencies in your Docker image, those packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will build a Conda environment with dependencies you specified, and will execute the run in that environment instead of using any Python libraries that you installed on the base image.
108+
You must also specify a base Docker image for this environment. Azure ML will build the conda environment on top of the Docker image provided. If you install some Python dependencies in your Docker image, those packages will not exist in the execution environment thus causing runtime failures. By default, Azure ML will build a Conda environment with dependencies you specified, and will execute the job in that environment instead of using any Python libraries that you installed on the base image.
109109

110110
The following example is a YAML specification file for an environment defined from a conda specification. Here the relative path to the conda file from the Azure ML environment YAML file is specified via the `conda_file` property. You can alternatively define the conda specification inline using the `conda_file` property, rather than defining it in a separate file.
111111

articles/machine-learning/how-to-manage-optimize-cost.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@ ms.topic: how-to
1111
ms.date: 06/08/2021
1212
---
1313

14+
[//]: # (needs PM review; ParallelJobStep or ParallelRunStep?)
15+
1416
# Manage and optimize Azure Machine Learning costs
1517

1618
Learn how to manage and optimize costs when training and deploying machine learning models to Azure Machine Learning.
@@ -19,7 +21,7 @@ Use the following tips to help you manage and optimize your compute resource cos
1921

2022
- Configure your training clusters for autoscaling
2123
- Set quotas on your subscription and workspaces
22-
- Set termination policies on your training run
24+
- Set termination policies on your training job
2325
- Use low-priority virtual machines (VM)
2426
- Schedule compute instances to shut down and start up automatically
2527
- Use an Azure Reserved VM Instance
@@ -42,7 +44,7 @@ Because these compute pools are inside of Azure's IaaS infrastructure, you can d
4244

4345
Autoscaling clusters based on the requirements of your workload helps reduce your costs so you only use what you need.
4446

45-
AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each run completes, the cluster will release nodes and scale to your configured minimum node count.
47+
AmlCompute clusters are designed to scale dynamically based on your workload. The cluster can be scaled up to the maximum number of nodes you configure. As each job completes, the cluster will release nodes and scale to your configured minimum node count.
4648

4749
[!INCLUDE [min-nodes-note](../../includes/machine-learning-min-nodes.md)]
4850

@@ -62,14 +64,14 @@ Also configure [workspace level quota by VM family](how-to-manage-quotas.md#work
6264

6365
To set quotas at the workspace level, start in the [Azure portal](https://portal.azure.com). Select any workspace in your subscription, and select **Usages + quotas** in the left pane. Then select the **Configure quotas** tab to view the quotas. You need privileges at the subscription scope to set the quota, since it's a setting that affects multiple workspaces.
6466

65-
## Set run autotermination policies
67+
## Set job autotermination policies
6668

6769
In some cases, you should configure your training runs to limit their duration or terminate them early. For example, when you are using Azure Machine Learning's built-in hyperparameter tuning or automated machine learning.
6870

6971
Here are a few options that you have:
7072
* Define a parameter called `max_run_duration_seconds` in your RunConfiguration to control the maximum duration a run can extend to on the compute you choose (either local or remote cloud compute).
7173
* For [hyperparameter tuning](how-to-tune-hyperparameters.md#early-termination), define an early termination policy from a Bandit policy, a Median stopping policy, or a Truncation selection policy. To further control hyperparameter sweeps, use parameters such as `max_total_runs` or `max_duration_minutes`.
72-
* For [automated machine learning](how-to-configure-auto-train.md#exit), set similar termination policies using the `enable_early_stopping` flag. Also use properties such as `iteration_timeout_minutes` and `experiment_timeout_minutes` to control the maximum duration of a run or for the entire experiment.
74+
* For [automated machine learning](how-to-configure-auto-train.md#exit), set similar termination policies using the `enable_early_stopping` flag. Also use properties such as `iteration_timeout_minutes` and `experiment_timeout_minutes` to control the maximum duration of a job or for the entire experiment.
7375

7476
## <a id="low-pri-vm"></a> Use low-priority VMs
7577

@@ -91,7 +93,7 @@ Azure Machine Learning Compute supports reserved instances inherently. If you pu
9193

9294
## Train locally
9395

94-
When prototyping and running training jobs that are small enough to run on your local computer, consider training locally. Using the Python SDK, setting your compute target to `local` executes your script locally. For more information, see [Configure and submit training runs](how-to-set-up-training-targets.md#select-a-compute-target).
96+
When prototyping and running training jobs that are small enough to run on your local computer, consider training locally. Using the Python SDK, setting your compute target to `local` executes your script locally. For more information, see [Configure and submit training jobs](how-to-set-up-training-targets.md#select-a-compute-target).
9597

9698
Visual Studio Code provides a full-featured environment for developing your machine learning applications. Using the Azure Machine Learning visual Visual Studio Code extension and Docker, you can run and debug locally. For more information, see [interactive debugging with Visual Studio Code](how-to-debug-visual-studio-code.md).
9799

articles/machine-learning/how-to-manage-quotas.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ The following table shows additional limits in the platform. Please reach out to
9191
| Job lifetime on a low-priority node | 7 days<sup>2</sup> |
9292
| Parameter servers per node | 1 |
9393

94-
<sup>1</sup> Maximum lifetime is the duration between when a run starts and when it finishes. Completed runs persist indefinitely. Data for runs not completed within the maximum lifetime is not accessible.
94+
<sup>1</sup> Maximum lifetime is the duration between when a job starts and when it finishes. Completed jobs persist indefinitely. Data for jobs not completed within the maximum lifetime is not accessible.
9595

9696
<sup>2</sup> Jobs on a low-priority node can be preempted whenever there's a capacity constraint. We recommend that you implement checkpoints in your job.
9797

articles/machine-learning/how-to-manage-resources-vscode.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -196,24 +196,24 @@ To view your job in Azure Machine Learning studio:
196196

197197
Alternatively, use the `> Azure ML: View Experiment in Studio` command respectively in the command palette.
198198

199-
### Track run progress
199+
### Track job progress
200200

201-
As you're running your job, you may want to see its progress. To track the progress of a run in Azure Machine Learning studio from the extension:
201+
As you're running your job, you may want to see its progress. To track the progress of a job in Azure Machine Learning studio from the extension:
202202

203203
1. Expand the subscription node that contains your workspace.
204204
1. Expand the **Experiments** node inside your workspace.
205205
1. Expand the job node you want to track progress for.
206-
1. Right-click the run and select **View Run in Studio**.
207-
1. A prompt appears asking you to open the run URL in Azure Machine Learning studio. Select **Open**.
206+
1. Right-click the job and select **View Job in Studio**.
207+
1. A prompt appears asking you to open the job URL in Azure Machine Learning studio. Select **Open**.
208208

209-
### Download run logs & outputs
209+
### Download job logs & outputs
210210

211-
Once a run is complete, you may want to download the logs and assets such as the model generated as part of a run.
211+
Once a job is complete, you may want to download the logs and assets such as the model generated as part of a job.
212212

213213
1. Expand the subscription node that contains your workspace.
214214
1. Expand the **Experiments** node inside your workspace.
215215
1. Expand the job node you want to download logs and outputs for.
216-
1. Right-click the run:
216+
1. Right-click the job:
217217
- To download the outputs, select **Download outputs**.
218218
- To download the logs, select **Download logs**.
219219

articles/machine-learning/how-to-migrate-from-estimators-to-scriptrunconfig.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -30,10 +30,10 @@ This article covers common considerations when migrating from Estimators to Scri
3030
Azure Machine Learning documentation and samples have been updated to use [ScriptRunConfig](/python/api/azureml-core/azureml.core.script_run_config.scriptrunconfig) for job configuration and submission.
3131

3232
For information on using ScriptRunConfig, refer to the following documentation:
33-
* [Configure and submit training runs](how-to-set-up-training-targets.md)
34-
* [Configuring PyTorch training runs](how-to-train-pytorch.md)
35-
* [Configuring TensorFlow training runs](how-to-train-tensorflow.md)
36-
* [Configuring scikit-learn training runs](how-to-train-scikit-learn.md)
33+
* [Configure and submit training jobs](how-to-set-up-training-targets.md)
34+
* [Configuring PyTorch training jobs](how-to-train-pytorch.md)
35+
* [Configuring TensorFlow training jobs](how-to-train-tensorflow.md)
36+
* [Configuring scikit-learn training jobs](how-to-train-scikit-learn.md)
3737

3838
In addition, refer to the following samples & tutorials:
3939
* [Azure/MachineLearningNotebooks](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/ml-frameworks)
@@ -129,4 +129,4 @@ src.run_config
129129

130130
## Next steps
131131

132-
* [Configure and submit training runs](how-to-set-up-training-targets.md)
132+
* [Configure and submit training jobs](how-to-set-up-training-targets.md)

articles/machine-learning/how-to-monitor-datasets.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -243,7 +243,7 @@ monitor = monitor.enable_schedule()
243243
| Features | List of features that will be analyzed for data drift over time. | Set to a model's output feature(s) to measure concept drift. Don't include features that naturally drift over time (month, year, index, etc.). You can backfill and existing data drift monitor after adjusting the list of features. | Yes |
244244
| Compute target | Azure Machine Learning compute target to run the dataset monitor jobs. | | Yes |
245245
| Enable | Enable or disable the schedule on the dataset monitor pipeline | Disable the schedule to analyze historical data with the backfill setting. It can be enabled after the dataset monitor is created. | Yes |
246-
| Frequency | The frequency that will be used to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each run compares data in the target dataset according to the frequency: <li>Daily: Compare most recent complete day in target dataset with baseline <li>Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline <li>Monthly: Compare most recent complete month in target dataset with baseline | No |
246+
| Frequency | The frequency that will be used to schedule the pipeline job and analyze historical data if running a backfill. Options include daily, weekly, or monthly. | Each job compares data in the target dataset according to the frequency: <li>Daily: Compare most recent complete day in target dataset with baseline <li>Weekly: Compare most recent complete week (Monday - Sunday) in target dataset with baseline <li>Monthly: Compare most recent complete month in target dataset with baseline | No |
247247
| Latency | Time, in hours, it takes for data to arrive in the dataset. For instance, if it takes three days for data to arrive in the SQL DB the dataset encapsulates, set the latency to 72. | Cannot be changed after the dataset monitor is created | No |
248248
| Email addresses | Email addresses for alerting based on breach of the data drift percentage threshold. | Emails are sent through Azure Monitor. | Yes |
249249
| Threshold | Data drift percentage threshold for email alerting. | Further alerts and events can be set on many other metrics in the workspace's associated Application Insights resource. | Yes |
@@ -279,7 +279,7 @@ This section contains feature-level insights into the change in the selected fea
279279

280280
The target dataset is also profiled over time. The statistical distance between the baseline distribution of each feature is compared with the target dataset's over time. Conceptually, this is similar to the data drift magnitude. However this statistical distance is for an individual feature rather than all features. Min, max, and mean are also available.
281281

282-
In the Azure Machine Learning studio, click on a bar in the graph to see the feature-level details for that date. By default, you see the baseline dataset's distribution and the most recent run's distribution of the same feature.
282+
In the Azure Machine Learning studio, click on a bar in the graph to see the feature-level details for that date. By default, you see the baseline dataset's distribution and the most recent job's distribution of the same feature.
283283

284284
:::image type="content" source="media/how-to-monitor-datasets/drift-by-feature.gif" alt-text="Drift magnitude by features":::
285285

@@ -345,7 +345,7 @@ Limitations and known issues for data drift monitors:
345345
* The time range when analyzing historical data is limited to 31 intervals of the monitor's frequency setting.
346346
* Limitation of 200 features, unless a feature list is not specified (all features used).
347347
* Compute size must be large enough to handle the data.
348-
* Ensure your dataset has data within the start and end date for a given monitor run.
348+
* Ensure your dataset has data within the start and end date for a given monitor job.
349349
* Dataset monitors will only work on datasets that contain 50 rows or more.
350350
* Columns, or features, in the dataset are classified as categorical or numeric based on the conditions in the following table. If the feature does not meet these conditions - for instance, a column of type string with >100 unique values - the feature is dropped from our data drift algorithm, but is still profiled.
351351

@@ -357,8 +357,8 @@ Limitations and known issues for data drift monitors:
357357
* When you have created a data drift monitor but cannot see data on the **Dataset monitors** page in Azure Machine Learning studio, try the following.
358358

359359
1. Check if you have selected the right date range at the top of the page.
360-
1. On the **Dataset Monitors** tab, select the experiment link to check run status. This link is on the far right of the table.
361-
1. If run completed successfully, check driver logs to see how many metrics has been generated or if there's any warning messages. Find driver logs in the **Output + logs** tab after you click on an experiment.
360+
1. On the **Dataset Monitors** tab, select the experiment link to check job status. This link is on the far right of the table.
361+
1. If the job completed successfully, check the driver logs to see how many metrics have been generated or if there's any warning messages. Find driver logs in the **Output + logs** tab after you click on an experiment.
362362

363363
* If the SDK `backfill()` function does not generate the expected output, it may be due to an authentication issue. When you create the compute to pass into this function, do not use `Run.get_context().experiment.workspace.compute_targets`. Instead, use [ServicePrincipalAuthentication](/python/api/azureml-core/azureml.core.authentication.serviceprincipalauthentication) such as the following to create the compute that you pass into that `backfill()` function:
364364

0 commit comments

Comments
 (0)