You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/machine-learning/v1/how-to-set-up-training-targets.md
+14-5Lines changed: 14 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -8,7 +8,7 @@ ms.author: sgilley
8
8
ms.reviewer: sgilley
9
9
ms.service: machine-learning
10
10
ms.subservice: training
11
-
ms.date: 10/21/2021
11
+
ms.date: 02/21/2024
12
12
ms.topic: how-to
13
13
ms.custom: UpdateFrequency5,sdkv1
14
14
---
@@ -26,11 +26,12 @@ All you need to do is define the environment for each compute target within a **
26
26
## Prerequisites
27
27
28
28
* If you don't have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning](https://azure.microsoft.com/free/) today
29
-
* The [Azure Machine Learning SDK for Python](/python/api/overview/azure/ml/install) (>= 1.13.0)
29
+
* The [Azure Machine Learning SDK for Python (v1)](/python/api/overview/azure/ml/install) (>= 1.13.0)
30
30
* An [Azure Machine Learning workspace](../how-to-manage-workspace.md), `ws`
31
31
* A compute target, `my_compute_target`. [Create a compute target](../how-to-create-attach-compute-studio.md)
32
32
33
33
## What's a script run configuration?
34
+
34
35
A [ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) is used to configure the information necessary for submitting a training job as part of an experiment.
35
36
36
37
You submit your training experiment with a ScriptRunConfig object. This object includes the:
@@ -60,6 +61,8 @@ Or you can:
60
61
61
62
Create an [experiment](concept-azure-machine-learning-architecture.md#experiments) in your workspace. An experiment is a light-weight container that helps to organize job submissions and keep track of code.
@@ -97,6 +102,8 @@ For more information and details about environments, see [Create & use software
97
102
98
103
If your compute target is your **local machine**, you are responsible for ensuring that all the necessary packages are available in the Python environment where the script runs. Use `python.user_managed_dependencies` to use your current Python environment (or the Python on the path you specify).
Now that you have a compute target (`my_compute_target`, see [Prerequisites](#prerequisites) and environment (`myenv`, see [Create an environment](#create-an-environment)), create a script job configuration that runs your training script (`train.py`) located in your `project_folder` directory:
If you do not specify an environment, a default environment will be created for you.
@@ -131,6 +137,7 @@ If you have command-line arguments you want to pass to your training script, you
131
137
If you want to override the default maximum time allowed for the job, you can do so via the **`max_run_duration_seconds`** parameter. The system will attempt to automatically cancel the job if it takes longer than this value.
132
138
133
139
### Specify a distributed job configuration
140
+
134
141
If you want to run a [distributed training](../how-to-train-distributed-gpu.md) job, provide the distributed job-specific config to the **`distributed_job_config`** parameter. Supported config types include [MpiConfiguration](/python/api/azureml-core/azureml.core.runconfig.mpiconfiguration), [TensorflowConfiguration](/python/api/azureml-core/azureml.core.runconfig.tensorflowconfiguration), and [PyTorchConfiguration](/python/api/azureml-core/azureml.core.runconfig.pytorchconfiguration).
135
142
136
143
For more information and examples on running distributed Horovod, TensorFlow and PyTorch jobs, see:
@@ -139,6 +146,8 @@ For more information and examples on running distributed Horovod, TensorFlow and
0 commit comments