Skip to content

Commit 6e743de

Browse files
authored
Merge pull request #190239 from lgayhardt/azml-pytorch-0222
Pytorch freshness part 2
2 parents 6f8101b + 51b60c8 commit 6e743de

File tree

1 file changed

+15
-4
lines changed

1 file changed

+15
-4
lines changed

articles/machine-learning/how-to-train-pytorch.md

Lines changed: 15 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ ms.service: machine-learning
77
ms.subservice: core
88
ms.author: minxia
99
author: mx-iao
10-
ms.date: 01/14/2020
10+
ms.date: 02/28/2022
1111
ms.topic: how-to
1212

1313
#Customer intent: As a Python PyTorch developer, I need to combine open-source with a cloud platform to train, evaluate, and deploy my deep learning models at scale.
@@ -17,7 +17,7 @@ ms.topic: how-to
1717

1818
In this article, learn how to run your [PyTorch](https://pytorch.org/) training scripts at enterprise scale using Azure Machine Learning.
1919

20-
The example scripts in this article are used to classify chicken and turkey images to build a deep learning neural network (DNN) based on [PyTorch's transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). Transfer learning is a technique that applies knowledge gained from solving one problem to a different but related problem. Transfer learning shortens the training process by requiring less data, time, and compute resources than training from scratch. See the [deep learning vs machine learning](./concept-deep-learning-vs-machine-learning.md#what-is-transfer-learning) article to learn more about transfer learning.
20+
The example scripts in this article are used to classify chicken and turkey images to build a deep learning neural network (DNN) based on [PyTorch's transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). Transfer learning is a technique that applies knowledge gained from solving one problem to a different but related problem. Transfer learning shortens the training process by requiring less data, time, and compute resources than training from scratch. To learn more about transfer learning, see the [deep learning vs machine learning](./concept-deep-learning-vs-machine-learning.md#what-is-transfer-learning) article.
2121

2222
Whether you're training a deep learning PyTorch model from the ground-up or you're bringing an existing model into the cloud, you can use Azure Machine Learning to scale out open-source training jobs using elastic cloud compute resources. You can build, deploy, version, and monitor production-grade models with Azure Machine Learning.
2323

@@ -88,8 +88,11 @@ shutil.copy('pytorch_train.py', project_folder)
8888
Create a compute target for your PyTorch job to run on. In this example, create a GPU-enabled Azure Machine Learning compute cluster.
8989

9090
```Python
91+
92+
# Choose a name for your CPU cluster
9193
cluster_name = "gpu-cluster"
9294

95+
# Verify that cluster does not exist already
9396
try:
9497
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
9598
print('Found existing compute target')
@@ -98,11 +101,15 @@ except ComputeTargetException:
98101
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
99102
max_nodes=4)
100103

104+
# Create the cluster with the specified name and configuration
101105
compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
102106

107+
# Wait for the cluster to complete, show the output log
103108
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
104109
```
105110

111+
If you instead want to create a CPU cluster, provide a different VM size to the vm_size parameter, such as STANDARD_D2_V2.
112+
106113
[!INCLUDE [low-pri-note](../../includes/machine-learning-low-pri-vm.md)]
107114

108115
For more information on compute targets, see the [what is a compute target](concept-compute-target.md) article.
@@ -113,7 +120,7 @@ To define the [Azure ML Environment](concept-environments.md) that encapsulates
113120

114121
#### Use a curated environment
115122

116-
Azure ML provides prebuilt, curated environments if you don't want to define your own environment. Azure ML has several CPU and GPU curated environments for PyTorch corresponding to different versions of PyTorch. For more info, see [here](resource-curated-environments.md).
123+
Azure ML provides prebuilt, [curated environments](resource-curated-environments.md) if you don't want to define your own environment. There are several CPU and GPU curated environments for PyTorch corresponding to different versions of PyTorch.
117124

118125
If you want to use a curated environment, you can run the following command instead:
119126

@@ -123,16 +130,19 @@ pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
123130
```
124131

125132
To see the packages included in the curated environment, you can write out the conda dependencies to disk:
133+
126134
```python
127135
pytorch_env.save_to_directory(path=curated_env_name)
128136
```
129137

130138
Make sure the curated environment includes all the dependencies required by your training script. If not, you'll have to modify the environment to include the missing dependencies. If the environment is modified, you'll have to give it a new name, as the 'AzureML' prefix is reserved for curated environments. If you modified the conda dependencies YAML file, you can create a new environment from it with a new name, for example:
139+
131140
```python
132141
pytorch_env = Environment.from_conda_specification(name='pytorch-1.6-gpu', file_path='./conda_dependencies.yml')
133142
```
134143

135144
If you had instead modified the curated environment object directly, you can clone that environment with a new name:
145+
136146
```python
137147
pytorch_env = pytorch_env.clone(new_name='pytorch-1.6-gpu')
138148
```
@@ -148,6 +158,7 @@ channels:
148158
- conda-forge
149159
dependencies:
150160
- python=3.6.2
161+
- pip=21.3.1
151162
- pip:
152163
- azureml-defaults
153164
- torch==1.6.0
@@ -177,7 +188,7 @@ For more information on creating and using environments, see [Create and use sof
177188

178189
### Create a ScriptRunConfig
179190

180-
Create a [ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Any arguments to your training script will be passed via command line if specified in the `arguments` parameter.
191+
Create a [ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Any arguments to your training script will be passed via command line if specified in the `arguments` parameter. The following code will configure a single-node PyTorch job.
181192

182193
```python
183194
from azureml.core import ScriptRunConfig

0 commit comments

Comments
 (0)