Skip to content

Commit 22144ae

Browse files
committed
freshness how-to-train-pytorch
1 parent d9bc93b commit 22144ae

File tree

1 file changed

+27
-29
lines changed

1 file changed

+27
-29
lines changed

articles/machine-learning/how-to-train-pytorch.md

Lines changed: 27 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ ms.subservice: training
88
ms.author: sgilley
99
author: sdgilley
1010
ms.reviewer: balapv
11-
ms.date: 01/26/2024
11+
ms.date: 09/17/2024
1212
ms.topic: how-to
1313
ms.custom: sdkv2, update-code1
1414
#Customer intent: As a Python PyTorch developer, I need to combine open-source with a cloud platform to train, evaluate, and deploy my deep learning models at scale.
@@ -18,11 +18,11 @@ ms.custom: sdkv2, update-code1
1818

1919
[!INCLUDE [sdk v2](includes/machine-learning-sdk-v2.md)]
2020

21-
In this article, you'll learn to train, hyperparameter tune, and deploy a [PyTorch](https://pytorch.org/) model using the Azure Machine Learning Python SDK v2.
21+
In this article, you learn to train, hyperparameter tune, and deploy a [PyTorch](https://pytorch.org/) model using the Azure Machine Learning Python SDK v2.
2222

23-
You'll use example scripts to classify chicken and turkey images to build a deep learning neural network (DNN) based on [PyTorch's transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). Transfer learning is a technique that applies knowledge gained from solving one problem to a different but related problem. Transfer learning shortens the training process by requiring less data, time, and compute resources than training from scratch. To learn more about transfer learning, see [Deep learning vs. machine learning](./concept-deep-learning-vs-machine-learning.md#what-is-transfer-learning).
23+
You use example scripts to classify chicken and turkey images to build a deep learning neural network (DNN) based on [PyTorch's transfer learning tutorial](https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html). Transfer learning is a technique that applies knowledge gained from solving one problem to a different but related problem. Transfer learning shortens the training process by requiring less data, time, and compute resources than training from scratch. To learn more about transfer learning, see [Deep learning vs. machine learning](./concept-deep-learning-vs-machine-learning.md#what-is-transfer-learning).
2424

25-
Whether you're training a deep learning PyTorch model from the ground-up or you're bringing an existing model into the cloud, you can use Azure Machine Learning to scale out open-source training jobs using elastic cloud compute resources. You can build, deploy, version, and monitor production-grade models with Azure Machine Learning.
25+
Whether you're training a deep learning PyTorch model from the ground-up or you're bringing an existing model into the cloud, use Azure Machine Learning to scale out open-source training jobs using elastic cloud compute resources. You can build, deploy, version, and monitor production-grade models with Azure Machine Learning.
2626

2727
## Prerequisites
2828

@@ -37,8 +37,6 @@ Whether you're training a deep learning PyTorch model from the ground-up or you'
3737

3838
You can also find a completed [Jupyter notebook version](https://github.com/Azure/azureml-examples/blob/main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb) of this guide on the GitHub samples page.
3939

40-
[!INCLUDE [gpu quota](includes/machine-learning-gpu-quota-prereq.md)]
41-
4240
## Set up the job
4341

4442
This section sets up the job for training by loading the required Python packages, connecting to a workspace, creating a compute resource to run a command job, and creating an environment to run the job.
@@ -51,7 +49,7 @@ We're using `DefaultAzureCredential` to get access to the workspace. This creden
5149

5250
If `DefaultAzureCredential` doesn't work for you, see [azure.identity package](/python/api/azure-identity/azure.identity) or [Set up authentication](how-to-setup-authentication.md?tabs=sdk) for more available credentials.
5351

54-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=credential)]
52+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=credential)]
5553

5654
If you prefer to use a browser to sign in and authenticate, you should uncomment the following code and use it instead.
5755

@@ -70,7 +68,7 @@ Next, get a handle to the workspace by providing your subscription ID, resource
7068
2. Select your workspace name to show your resource group and subscription ID.
7169
3. Copy the values for your resource group and subscription ID into the code.
7270

73-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=ml_client)]
71+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=ml_client)]
7472

7573
The result of running this script is a workspace handle that you can use to manage other resources and jobs.
7674

@@ -83,19 +81,19 @@ Azure Machine Learning needs a compute resource to run a job. This resource can
8381

8482
In the following example script, we provision a Linux [compute cluster](./how-to-create-attach-compute-cluster.md?tabs=python). You can see the [Azure Machine Learning pricing](https://azure.microsoft.com/pricing/details/machine-learning/) page for the full list of VM sizes and prices. Since we need a GPU cluster for this example, let's pick a `STANDARD_NC6` model and create an Azure Machine Learning compute.
8583

86-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=gpu_compute_target)]
84+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=gpu_compute_target)]
8785

8886
### Create a job environment
8987

9088
To run an Azure Machine Learning job, you need an environment. An Azure Machine Learning [environment](concept-environments.md) encapsulates the dependencies (such as software runtime and libraries) needed to run your machine learning training script on your compute resource. This environment is similar to a Python environment on your local machine.
9189

92-
Azure Machine Learning allows you to either use a curated (or ready-made) environment or create a custom environment using a Docker image or a Conda configuration. In this article, you reuse the curated Azure Machine Learning environment `AzureML-pytorch-1.9-ubuntu18.04-py37-cuda11-gpu`. Use the latest version of this environment using the `@latest` directive.
90+
Azure Machine Learning allows you to either use a curated (or ready-made) environment or create a custom environment using a Docker image or a Conda configuration. In this article, you reuse the curated Azure Machine Learning environment `AzureML-acpt-pytorch-2.2-cuda12.1`. Use the latest version of this environment using the `@latest` directive.
9391

94-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=curated_env_name)]
92+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=curated_env_name)]
9593

9694
## Configure and submit your training job
9795

98-
In this section, we begin by introducing the data for training. We then cover how to run a training job, using a training script that we've provided. You'll learn to build the training job by configuring the command for running the training script. Then, you'll submit the training job to run in Azure Machine Learning.
96+
In this section, we begin by introducing the data for training. We then cover how to run a training job, using a training script that we've provided. You learn to build the training job by configuring the command for running the training script. Then, you submit the training job to run in Azure Machine Learning.
9997

10098
### Obtain the training data
10199

@@ -115,9 +113,9 @@ An Azure Machine Learning `command` is a resource that specifies all the details
115113

116114
#### Configure the command
117115

118-
You'll use the general purpose `command` to run the training script and perform your desired tasks. Create a `command` object to specify the configuration details of your training job.
116+
You use the general purpose `command` to run the training script and perform your desired tasks. Create a `command` object to specify the configuration details of your training job.
119117

120-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=job)]
118+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=job)]
121119

122120
- The inputs for this command include the number of epochs, learning rate, momentum, and output directory.
123121
- For the parameter values:
@@ -131,7 +129,7 @@ You'll use the general purpose `command` to run the training script and perform
131129

132130
It's now time to submit the job to run in Azure Machine Learning. This time, you use `create_or_update` on `ml_client.jobs`.
133131

134-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=create_job)]
132+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=create_job)]
135133

136134
Once completed, the job registers a model in your workspace (as a result of training) and outputs a link for viewing the job in Azure Machine Learning studio.
137135

@@ -156,7 +154,7 @@ To tune the model's hyperparameters, define the parameter space in which to sear
156154

157155
Since the training script uses a learning rate schedule to decay the learning rate every several epochs, you can tune the initial learning rate and the momentum parameters.
158156

159-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=job_for_sweep)]
157+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=job_for_sweep)]
160158

161159
Then, you can configure sweep on the command job, using some sweep-specific parameters, such as the primary metric to watch and the sampling algorithm to use.
162160

@@ -165,19 +163,19 @@ In the following code, we use random sampling to try different configuration set
165163
We also define an early termination policy, the `BanditPolicy`, to terminate poorly performing runs early.
166164
The `BanditPolicy` terminates any run that doesn't fall within the slack factor of our primary evaluation metric. You apply this policy every epoch (since we report our `best_val_acc` metric every epoch and `evaluation_interval`=1). Notice we delay the first policy evaluation until after the first 10 epochs (`delay_evaluation`=10).
167165

168-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=sweep_job)]
166+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=sweep_job)]
169167

170168
Now, you can submit this job as before. This time, you're running a sweep job that sweeps over your train job.
171169

172-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=create_sweep_job)]
170+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=create_sweep_job)]
173171

174172
You can monitor the job by using the studio user interface link that's presented during the job run.
175173

176174
## Find the best model
177175

178176
Once all the runs complete, you can find the run that produced the model with the highest accuracy.
179177

180-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=model)]
178+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=model)]
181179

182180
## Deploy the model as an online endpoint
183181

@@ -193,13 +191,13 @@ For more information about deployment, see [Deploy and score a machine learning
193191

194192
As a first step to deploying your model, you need to create your online endpoint. The endpoint name must be unique in the entire Azure region. For this article, you create a unique name using a universally unique identifier (UUID).
195193

196-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=online_endpoint_name)]
194+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=online_endpoint_name)]
197195

198-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=endpoint)]
196+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=endpoint)]
199197

200198
After you create the endpoint, you can retrieve it as follows:
201199

202-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=get_endpoint)]
200+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=get_endpoint)]
203201

204202
### Deploy the model to the endpoint
205203

@@ -213,7 +211,7 @@ The code to deploy the model to the endpoint:
213211
- Scores the model, using the *score.py* file.
214212
- Uses the curated environment (that you specified earlier) to perform inferencing.
215213

216-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=blue_deployment)]
214+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=blue_deployment)]
217215

218216
> [!NOTE]
219217
> Expect this deployment to take a bit of time to finish.
@@ -224,30 +222,30 @@ Now that you deployed the model to the endpoint, you can predict the output of t
224222

225223
To test the endpoint, let's use a sample image for prediction. First, let's display the image.
226224

227-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=display_image)]
225+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=display_image)]
228226

229227
Create a function to format and resize the image.
230228

231-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=process_image)]
229+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=process_image)]
232230

233231
Format the image and convert it to a JSON file.
234232

235-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=test_json)]
233+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=test_json)]
236234

237235
You can then invoke the endpoint with this JSON and print the result.
238236

239-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=test_deployment)]
237+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=test_deployment)]
240238

241239
### Clean up resources
242240

243241
If you don't need the endpoint anymore, delete it to stop using resource. Make sure no other deployments are using the endpoint before you delete it.
244242

245-
[!notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=delete_endpoint)]
243+
[!Notebook-python[](~/azureml-examples-main/sdk/python/jobs/single-step/pytorch/train-hyperparameter-tune-deploy-with-pytorch/train-hyperparameter-tune-deploy-with-pytorch.ipynb?name=delete_endpoint)]
246244

247245
> [!NOTE]
248246
> Expect this cleanup to take a bit of time to finish.
249247
250-
## Next steps
248+
## Related content
251249

252250
In this article, you trained and registered a deep learning neural network using PyTorch on Azure Machine Learning. You also deployed the model to an online endpoint. See these other articles to learn more about Azure Machine Learning.
253251

0 commit comments

Comments
 (0)