Skip to content

Commit a869d56

Browse files
committed
freshness pass
1 parent ca02d3e commit a869d56

File tree

1 file changed

+25
-24
lines changed

1 file changed

+25
-24
lines changed

articles/machine-learning/v1/how-to-set-up-training-targets.md

Lines changed: 25 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.author: sgilley
88
ms.reviewer: sgilley
99
ms.service: azure-machine-learning
1010
ms.subservice: training
11-
ms.date: 02/21/2024
11+
ms.date: 03/10/2025
1212
ms.topic: how-to
1313
ms.custom: UpdateFrequency5,sdkv1
1414
---
@@ -19,7 +19,7 @@ ms.custom: UpdateFrequency5,sdkv1
1919

2020
In this article, you learn how to configure and submit Azure Machine Learning jobs to train your models. Snippets of code explain the key parts of configuration and submission of a training script. Then use one of the [example notebooks](#notebook-examples) to find the full end-to-end working examples.
2121

22-
When training, it is common to start on your local computer, and then later scale out to a cloud-based cluster. With Azure Machine Learning, you can run your script on various compute targets without having to change your training script.
22+
When training, it's common to start on your local computer, and then later scale out to a cloud-based cluster. With Azure Machine Learning, you can run your script on various compute targets without having to change your training script.
2323

2424
All you need to do is define the environment for each compute target within a **script job configuration**. Then, when you want to run your training experiment on a different compute target, specify the job configuration for that compute.
2525

@@ -40,7 +40,7 @@ You submit your training experiment with a ScriptRunConfig object. This object i
4040
* **script**: The training script to run
4141
* **compute_target**: The compute target to run on
4242
* **environment**: The environment to use when running the script
43-
* and some additional configurable options (see the [reference documentation](/python/api/azureml-core/azureml.core.scriptrunconfig) for more information)
43+
* other configurable options (see the [reference documentation](/python/api/azureml-core/azureml.core.scriptrunconfig) for more information)
4444

4545
## Train your model
4646

@@ -57,18 +57,6 @@ Or you can:
5757
* Submit a HyperDrive run for [hyperparameter tuning](../how-to-tune-hyperparameters.md).
5858
* Submit an experiment via the [VS Code extension](../tutorial-train-deploy-image-classification-model-vscode.md#train-the-model).
5959

60-
## Create an experiment
61-
62-
Create an [experiment](concept-azure-machine-learning-architecture.md#experiments) in your workspace. An experiment is a light-weight container that helps to organize job submissions and keep track of code.
63-
64-
[!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)]
65-
66-
```python
67-
from azureml.core import Experiment
68-
69-
experiment_name = 'my_experiment'
70-
experiment = Experiment(workspace=ws, name=experiment_name)
71-
```
7260

7361
## Select a compute target
7462

@@ -77,7 +65,7 @@ Select the compute target where your training script will run on. If no compute
7765
The example code in this article assumes that you have already created a compute target `my_compute_target` from the "Prerequisites" section.
7866

7967
>[!NOTE]
80-
> - Azure Databricks is not supported as a compute target for model training. You can use Azure Databricks for data preparation and deployment tasks.
68+
> - Azure Databricks isn't supported as a compute target for model training. You can use Azure Databricks for data preparation and deployment tasks.
8169
> - To create and attach a compute target for training on Azure Arc-enabled Kubernetes cluster, see [Configure Azure Arc-enabled Machine Learning](../how-to-attach-kubernetes-anywhere.md)
8270
8371
## Create an environment
@@ -114,6 +102,19 @@ myenv.python.user_managed_dependencies = True
114102
# myenv.python.interpreter_path = '/home/johndoe/miniconda3/envs/myenv/bin/python'
115103
```
116104

105+
## Create an experiment
106+
107+
Create an [experiment](concept-azure-machine-learning-architecture.md#experiments) in your workspace. An experiment is a light-weight container that helps to organize job submissions and keep track of code.
108+
109+
[!INCLUDE [sdk v1](../includes/machine-learning-sdk-v1.md)]
110+
111+
```python
112+
from azureml.core import Experiment
113+
114+
experiment_name = 'my_experiment'
115+
experiment = Experiment(workspace=ws, name=experiment_name)
116+
```
117+
117118
## Create the script job configuration
118119

119120
Now that you have a compute target (`my_compute_target`, see [Prerequisites,](#prerequisites) and environment (`myenv`, see [Create an environment](#create-an-environment)), create a script job configuration that runs your training script (`train.py`) located in your `project_folder` directory:
@@ -130,7 +131,7 @@ src = ScriptRunConfig(source_directory=project_folder,
130131

131132
```
132133

133-
If you don't specify an environment, a default environment will be created for you.
134+
If you don't specify an environment, a default environment is created for you.
134135

135136
If you have command-line arguments you want to pass to your training script, you can specify them via the **`arguments`** parameter of the ScriptRunConfig constructor, for example, `arguments=['--arg1', arg1_val, '--arg2', arg2_val]`.
136137

@@ -154,19 +155,19 @@ run.wait_for_completion(show_output=True)
154155
```
155156

156157
> [!IMPORTANT]
157-
> When you submit the training job, a snapshot of the directory that contains your training scripts will be created and sent to the compute target. It is also stored as part of the experiment in your workspace. If you change files and submit the job again, only the changed files will be uploaded.
158+
> When you submit the training job, a snapshot of the directory that contains your training scripts is created and sent to the compute target. It's also stored as part of the experiment in your workspace. If you change files and submit the job again, only the changed files are uploaded.
158159
>
159160
> [!INCLUDE [amlinclude-info](../includes/machine-learning-amlignore-gitignore.md)]
160161
>
161162
> For more information about snapshots, see [Snapshots](concept-azure-machine-learning-architecture.md#snapshots).
162163
163164
> [!IMPORTANT]
164165
> **Special Folders**
165-
> Two folders, *outputs* and *logs*, receive special treatment by Azure Machine Learning. During training, when you write files to folders named *outputs* and *logs* that are relative to the root directory (`./outputs` and `./logs`, respectively), the files will automatically upload to your job history so that you have access to them once your job is finished.
166+
> Two folders, *outputs* and *logs*, receive special treatment by Azure Machine Learning. During training, when you write files to folders named *outputs* and *logs* that are relative to the root directory (`./outputs` and `./logs`, respectively), the files automatically upload to your job history so that you have access to them once your job is finished.
166167
>
167-
> To create artifacts during training (such as model files, checkpoints, data files, or plotted images) write these to the `./outputs` folder.
168+
> To create artifacts during training (such as model files, checkpoints, data files, or plotted images) write to the `./outputs` folder.
168169
>
169-
> Similarly, you can write any logs from your training job to the `./logs` folder. To utilize Azure Machine Learning's [TensorBoard integration](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/track-and-monitor-experiments/tensorboard/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb) make sure you write your TensorBoard logs to this folder. While your job is in progress, you will be able to launch TensorBoard and stream these logs. Later, you will also be able to restore the logs from any of your previous jobs.
170+
> Similarly, you can write any logs from your training job to the `./logs` folder. To utilize Azure Machine Learning's [TensorBoard integration](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/track-and-monitor-experiments/tensorboard/export-run-history-to-tensorboard/export-run-history-to-tensorboard.ipynb), make sure you write your TensorBoard logs to this folder. While your job is in progress, you'll be able to launch TensorBoard and stream these logs. Later, you'll also be able to restore the logs from any of your previous jobs.
170171
>
171172
> For example, to download a file written to the *outputs* folder to your local machine after your remote training job:
172173
> `run.download_file(name='outputs/my_output_file', output_file_path='my_destination_path')`
@@ -186,17 +187,17 @@ See these notebooks for examples of configuring jobs for various training scenar
186187

187188
## Troubleshooting
188189

189-
* **AttributeError: 'RoundTripLoader' object has no attribute 'comment_handling'**: This error comes from the new version (v0.17.5) of `ruamel-yaml`, an `azureml-core` dependency, that introduces a breaking change to `azureml-core`. In order to fix this error, uninstall `ruamel-yaml` by running `pip uninstall ruamel-yaml` and installing a different version of `ruamel-yaml`; the supported versions are v0.15.35 to v0.17.4 (inclusive). You can do this by running `pip install "ruamel-yaml>=0.15.35,<0.17.5"`.
190+
* **AttributeError: 'RoundTripLoader' object has no attribute 'comment_handling'**: This error comes from the new version (v0.17.5) of `ruamel-yaml`, an `azureml-core` dependency, that introduces a breaking change to `azureml-core`. In order to fix this error, uninstall `ruamel-yaml` by running `pip uninstall ruamel-yaml` and installing a different version of `ruamel-yaml`; the supported versions are v0.15.35 to v0.17.4 (inclusive). You can do so by running `pip install "ruamel-yaml>=0.15.35,<0.17.5"`.
190191

191192

192193
* **Job fails with `jwt.exceptions.DecodeError`**: Exact error message: `jwt.exceptions.DecodeError: It is required that you pass in a value for the "algorithms" argument when calling decode()`.
193194

194195
Consider upgrading to the latest version of azureml-core: `pip install -U azureml-core`.
195196

196-
If . you're running into this issue for local jobs, check the version of PyJWT installed in your environment where . you're starting jobs. The supported versions of PyJWT are < 2.0.0. Uninstall PyJWT from the environment if the version is >= 2.0.0. You may check the version of PyJWT, uninstall, and install the right version as follows:
197+
If you run into this issue for local jobs, check the version of PyJWT installed in your environment where . you're starting jobs. The supported versions of PyJWT are < 2.0.0. Uninstall PyJWT from the environment if the version is >= 2.0.0. You may check the version of PyJWT, uninstall, and install the right version as follows:
197198
1. Start a command shell, activate conda environment where azureml-core is installed.
198199
2. Enter `pip freeze` and look for `PyJWT`, if found, the version listed should be < 2.0.0
199-
3. If the listed version is not a supported version, `pip uninstall PyJWT` in the command shell and enter y for confirmation.
200+
3. If the listed version isn't a supported version, `pip uninstall PyJWT` in the command shell and enter y for confirmation.
200201
4. Install using `pip install 'PyJWT<2.0.0'`
201202

202203
If . you're submitting a user-created environment with your job, consider using the latest version of azureml-core in that environment. Versions >= 1.18.0 of azureml-core already pin PyJWT < 2.0.0. If you need to use a version of azureml-core < 1.18.0 in the environment you submit, make sure to specify PyJWT < 2.0.0 in your pip dependencies.

0 commit comments

Comments
 (0)