Skip to content

Commit 1f4928d

Browse files
Merge pull request #204675 from Man-MSFT/mafong/getting-started
Azure ML: Update Python Day 1 articles to use V2 SDK
2 parents 4a070d9 + 920c055 commit 1f4928d

File tree

3 files changed

+266
-257
lines changed

3 files changed

+266
-257
lines changed

articles/machine-learning/tutorial-1st-experiment-bring-data.md

Lines changed: 92 additions & 118 deletions
Original file line numberDiff line numberDiff line change
@@ -9,32 +9,33 @@ ms.topic: tutorial
99
author: aminsaied
1010
ms.author: amsaied
1111
ms.reviewer: sgilley
12-
ms.date: 12/21/2021
12+
ms.date: 07/10/2022
1313
ms.custom: tracking-python, contperf-fy21q3, FY21Q4-aml-seo-hack, contperf-fy21q4, sdkv1, event-tier1-build-2022
1414
---
1515

1616
# Tutorial: Upload data and train a model (part 3 of 3)
1717

18-
[!INCLUDE [sdk v1](../../includes/machine-learning-sdk-v1.md)]
18+
[!INCLUDE [sdk v2](../../includes/machine-learning-sdk-v2.md)]
1919

20-
This tutorial shows you how to upload and use your own data to train machine learning models in Azure Machine Learning. This tutorial is *part 3 of a three-part tutorial series*.
20+
This tutorial shows you how to upload and use your own data to train machine learning models in Azure Machine Learning. This tutorial is *part 3 of a three-part tutorial series*.
2121

22-
In [Part 2: Train a model](tutorial-1st-experiment-sdk-train.md), you trained a model in the cloud, using sample data from `PyTorch`. You also downloaded that data through the `torchvision.datasets.CIFAR10` method in the PyTorch API. In this tutorial, you'll use the downloaded data to learn the workflow for working with your own data in Azure Machine Learning.
22+
In [Part 2: Train a model](tutorial-1st-experiment-sdk-train.md), you trained a model in the cloud, using sample data from `PyTorch`. You also downloaded that data through the `torchvision.datasets.CIFAR10` method in the PyTorch API. In this tutorial, you'll use the downloaded data to learn the workflow for working with your own data in Azure Machine Learning.
2323

2424
In this tutorial, you:
2525

2626
> [!div class="checklist"]
27+
>
2728
> * Upload data to Azure.
2829
> * Create a control script.
29-
> * Understand the new Azure Machine Learning concepts (passing parameters, datasets, datastores).
30+
> * Understand the new Azure Machine Learning concepts (passing parameters, data inputs).
3031
> * Submit and run your training script.
3132
> * View your code output in the cloud.
3233
3334
## Prerequisites
3435

35-
You'll need the data that was downloaded in the previous tutorial. Make sure you have completed these steps:
36+
You'll need the data that was downloaded in the previous tutorial. Make sure you have completed these steps:
3637

37-
1. [Create the training script](tutorial-1st-experiment-sdk-train.md#create-training-scripts).
38+
1. [Create the training script](tutorial-1st-experiment-sdk-train.md#create-training-scripts).
3839
1. [Test locally](tutorial-1st-experiment-sdk-train.md#test-local).
3940

4041
## Adjust the training script
@@ -43,21 +44,21 @@ By now you have your training script (get-started/src/train.py) running in Azure
4344

4445
Our training script is currently set to download the CIFAR10 dataset on each run. The following Python code has been adjusted to read the data from a directory.
4546

46-
>[!NOTE]
47+
> [!NOTE]
4748
> The use of `argparse` parameterizes the script.
4849
4950
1. Open *train.py* and replace it with this code:
5051

51-
```python
52+
```python
5253
import os
5354
import argparse
5455
import torch
5556
import torch.optim as optim
5657
import torchvision
5758
import torchvision.transforms as transforms
5859
from model import Net
59-
from azureml.core import Run
60-
run = Run.get_context()
60+
import mlflow
61+
6162
if __name__ == "__main__":
6263
parser = argparse.ArgumentParser()
6364
parser.add_argument(
@@ -126,13 +127,13 @@ Our training script is currently set to download the CIFAR10 dataset on each run
126127
running_loss += loss.item()
127128
if i % 2000 == 1999:
128129
loss = running_loss / 2000
129-
run.log('loss', loss) # log loss metric to AML
130+
mlflow.log_metric('loss', loss)
130131
print(f'epoch={epoch + 1}, batch={i + 1:5}: loss {loss:.2f}')
131132
running_loss = 0.0
132133
print('Finished Training')
133-
```
134+
```
134135

135-
1. **Save** the file. Close the tab if you wish.
136+
1. **Save** the file. Close the tab if you wish.
136137

137138
### Understanding the code changes
138139

@@ -158,165 +159,139 @@ optimizer = optim.SGD(
158159
)
159160
```
160161

161-
162162
## <a name="upload"></a> Upload the data to Azure
163163

164164
To run this script in Azure Machine Learning, you need to make your training data available in Azure. Your Azure Machine Learning workspace comes equipped with a _default_ datastore. This is an Azure Blob Storage account where you can store your training data.
165165

166-
>[!NOTE]
167-
> Azure Machine Learning allows you to connect other cloud-based datastores that store your data. For more details, see the [datastores documentation](./concept-data.md).
168-
169-
1. Create a new Python control script in the **get-started** folder (make sure it is in **get-started**, *not* in the **/src** folder). Name the script *upload-data.py* and copy this code into the file:
170-
171-
```python
172-
# upload-data.py
173-
from azureml.core import Workspace
174-
from azureml.core import Dataset
175-
from azureml.data.datapath import DataPath
176-
177-
ws = Workspace.from_config()
178-
datastore = ws.get_default_datastore()
179-
Dataset.File.upload_directory(src_dir='data',
180-
target=DataPath(datastore, "datasets/cifar10")
181-
)
182-
```
183-
184-
The `target_path` value specifies the path on the datastore where the CIFAR10 data will be uploaded.
185-
186-
>[!TIP]
187-
> While you're using Azure Machine Learning to upload the data, you can use [Azure Storage Explorer](https://azure.microsoft.com/features/storage-explorer/) to upload ad hoc files. If you need an ETL tool, you can use [Azure Data Factory](../data-factory/introduction.md) to ingest your data into Azure.
188-
189-
2. Select **Save and run script in terminal** to run the *upload-data.py* script.
190-
191-
You should see the following standard output:
166+
> [!NOTE]
167+
> Azure Machine Learning allows you to connect other cloud-based storages that store your data. For more details, see the [data documentation](./concept-data.md).
192168
193-
```txt
194-
Uploading ./data\cifar-10-batches-py\data_batch_2
195-
Uploaded ./data\cifar-10-batches-py\data_batch_2, 4 files out of an estimated total of 9
196-
.
197-
.
198-
Uploading ./data\cifar-10-batches-py\data_batch_5
199-
Uploaded ./data\cifar-10-batches-py\data_batch_5, 9 files out of an estimated total of 9
200-
Uploaded 9 files
201-
```
169+
There is no additional step needed for uploading data, the control script will define and upload the CIFAR10 training data.
202170

203171
## <a name="control-script"></a> Create a control script
204172

205173
As you've done previously, create a new Python control script called *run-pytorch-data.py* in the **get-started** folder:
206174

207175
```python
208176
# run-pytorch-data.py
177+
from azure.ai.ml import MLClient, command, Input
178+
from azure.identity import DefaultAzureCredential
179+
from azure.ai.ml.entities import Environment
180+
from azure.ai.ml import command, Input
181+
from azure.ai.ml.entities import Data
182+
from azure.ai.ml.constants import AssetTypes
209183
from azureml.core import Workspace
210-
from azureml.core import Experiment
211-
from azureml.core import Environment
212-
from azureml.core import ScriptRunConfig
213-
from azureml.core import Dataset
214184

215185
if __name__ == "__main__":
186+
# get details of the current Azure ML workspace
216187
ws = Workspace.from_config()
217-
datastore = ws.get_default_datastore()
218-
dataset = Dataset.File.from_files(path=(datastore, 'datasets/cifar10'))
219-
220-
experiment = Experiment(workspace=ws, name='day1-experiment-data')
221-
222-
config = ScriptRunConfig(
223-
source_directory='./src',
224-
script='train.py',
225-
compute_target='cpu-cluster',
226-
arguments=[
227-
'--data_path', dataset.as_named_input('input').as_mount(),
228-
'--learning_rate', 0.003,
229-
'--momentum', 0.92],
230-
)
231188

232-
# set up pytorch environment
233-
env = Environment.from_conda_specification(
234-
name='pytorch-env',
235-
file_path='pytorch-env.yml'
189+
# default authentication flow for Azure applications
190+
default_azure_credential = DefaultAzureCredential()
191+
subscription_id = ws.subscription_id
192+
resource_group = ws.resource_group
193+
workspace = ws.name
194+
195+
# client class to interact with Azure ML services and resources, e.g. workspaces, jobs, models and so on.
196+
ml_client = MLClient(
197+
default_azure_credential,
198+
subscription_id,
199+
resource_group,
200+
workspace)
201+
202+
# the key here should match the key passed to the command
203+
my_job_inputs = {
204+
"data_path": Input(type=AssetTypes.URI_FOLDER, path="./data")
205+
}
206+
207+
env_name = "pytorch-env"
208+
env_docker_image = Environment(
209+
image="pytorch/pytorch:latest",
210+
name=env_name,
211+
conda_file="pytorch-env.yml",
212+
)
213+
ml_client.environments.create_or_update(env_docker_image)
214+
215+
# target name of compute where job will be executed
216+
computeName="cpu-cluster"
217+
job = command(
218+
code="./src",
219+
# the parameter will match the training script argument name
220+
# inputs.data_path key should match the dictionary key
221+
command="python train.py --data_path ${{inputs.data_path}}",
222+
inputs=my_job_inputs,
223+
environment=f"{env_name}@latest",
224+
compute=computeName,
225+
display_name="day1-experiment-data",
236226
)
237-
config.run_config.environment = env
238227

239-
run = experiment.submit(config)
240-
aml_url = run.get_portal_url()
241-
print("Submitted to compute cluster. Click link below")
242-
print("")
243-
print(aml_url)
228+
returned_job = ml_client.create_or_update(job)
229+
aml_url = returned_job.studio_url
230+
print("Monitor your job at", aml_url)
244231
```
245232

246233
> [!TIP]
247-
> If you used a different name when you created your compute cluster, make sure to adjust the name in the code `compute_target='cpu-cluster'` as well.
234+
> If you used a different name when you created your compute cluster, make sure to adjust the name in the code `computeName='cpu-cluster'` as well.
248235
249236
### Understand the code changes
250237

251-
The control script is similar to the one from [part 3 of this series](tutorial-1st-experiment-sdk-train.md), with the following new lines:
238+
The control script is similar to the one from [part 2 of this series](tutorial-1st-experiment-sdk-train.md), with the following new lines:
252239

253240
:::row:::
254241
:::column span="":::
255-
`dataset = Dataset.File.from_files( ... )`
242+
`my_job_inputs = { "data_path": Input(type=AssetTypes.URI_FOLDER, path="./data")}`
256243
:::column-end:::
257244
:::column span="2":::
258-
A [dataset](/python/api/azureml-core/azureml.core.dataset.dataset) is used to reference the data you uploaded to Azure Blob Storage. Datasets are an abstraction layer on top of your data that are designed to improve reliability and trustworthiness.
245+
An [Input](/python/api/azure-ai-ml/azure.ai.ml.input) is used to reference inputs to your job. These can encompass data, either uploaded as part of the job or references to previously registered data assets. URI\*FOLDER tells that the reference points to a folder of data. The data will be mounted by default to the compute for the job.
259246
:::column-end:::
260247
:::row-end:::
261248
:::row:::
262249
:::column span="":::
263-
`config = ScriptRunConfig(...)`
250+
`command="python train.py --data_path ${{inputs.data_path}}"`
264251
:::column-end:::
265252
:::column span="2":::
266-
[ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) is modified to include a list of arguments that will be passed into `train.py`. The `dataset.as_named_input('input').as_mount()` argument means the specified directory will be _mounted_ to the compute target.
253+
`--data_path` matches the argument defined in the updated training script. `${{inputs.data_path}}` passes the input defined by the input dictionary, and the keys must match.
267254
:::column-end:::
268255
:::row-end:::
269256

270-
## <a name="submit-to-cloud"></a> Submit the run to Azure Machine Learning
257+
## <a name="submit-to-cloud"></a> Submit the job to Azure Machine Learning
271258

272-
Select **Save and run script in terminal** to run the *run-pytorch-data.py* script. This run will train the model on the compute cluster using the data you uploaded.
259+
Select **Save and run script in terminal** to run the *run-pytorch-data.py* script. This job will train the model on the compute cluster using the data you uploaded.
273260

274261
This code will print a URL to the experiment in the Azure Machine Learning studio. If you go to that link, you'll be able to see your code running.
275262

276263
[!INCLUDE [amlinclude-info](../../includes/machine-learning-py38-ignore.md)]
277264

278-
279265
### <a name="inspect-log"></a> Inspect the log file
280266

281267
In the studio, go to the experiment job (by selecting the previous URL output) followed by **Outputs + logs**. Select the `std_log.txt` file. Scroll down through the log file until you see the following output:
282268

283269
```txt
284-
Processing 'input'.
285-
Processing dataset FileDataset
286-
{
287-
"source": [
288-
"('workspaceblobstore', 'datasets/cifar10')"
289-
],
290-
"definition": [
291-
"GetDatastoreFiles"
292-
],
293-
"registration": {
294-
"id": "XXXXX",
295-
"name": null,
296-
"version": null,
297-
"workspace": "Workspace.create(name='XXXX', subscription_id='XXXX', resource_group='X')"
298-
}
299-
}
300-
Mounting input to /tmp/tmp9kituvp3.
301-
Mounted input to /tmp/tmp9kituvp3 as folder.
302-
Exit __enter__ of DatasetContextManager
303-
Entering Job History Context Manager.
304-
Current directory: /mnt/batch/tasks/shared/LS_root/jobs/dsvm-aml/azureml/tutorial-session-3_1600171983_763c5381/mounts/workspaceblobstore/azureml/tutorial-session-3_1600171983_763c5381
305-
Preparing to call script [ train.py ] with arguments: ['--data_path', '$input', '--learning_rate', '0.003', '--momentum', '0.92']
306-
After variable expansion, calling script [ train.py ] with arguments: ['--data_path', '/tmp/tmp9kituvp3', '--learning_rate', '0.003', '--momentum', '0.92']
307-
308-
Script type = None
309270
===== DATA =====
310-
DATA PATH: /tmp/tmp9kituvp3
271+
DATA PATH: /mnt/azureml/cr/j/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/cap/data-capability/wd/INPUT_data_path
311272
LIST FILES IN DATA PATH...
312-
['cifar-10-batches-py', 'cifar-10-python.tar.gz']
273+
['.amlignore', 'cifar-10-batches-py', 'cifar-10-python.tar.gz']
274+
================
275+
epoch=1, batch= 2000: loss 2.20
276+
epoch=1, batch= 4000: loss 1.90
277+
epoch=1, batch= 6000: loss 1.70
278+
epoch=1, batch= 8000: loss 1.58
279+
epoch=1, batch=10000: loss 1.54
280+
epoch=1, batch=12000: loss 1.48
281+
epoch=2, batch= 2000: loss 1.41
282+
epoch=2, batch= 4000: loss 1.38
283+
epoch=2, batch= 6000: loss 1.33
284+
epoch=2, batch= 8000: loss 1.30
285+
epoch=2, batch=10000: loss 1.29
286+
epoch=2, batch=12000: loss 1.25
287+
Finished Training
288+
313289
```
314290

315291
Notice:
316292

317-
- Azure Machine Learning has mounted Blob Storage to the compute cluster automatically for you.
318-
- The ``dataset.as_named_input('input').as_mount()`` used in the control script resolves to the mount point.
319-
293+
- Azure Machine Learning has mounted Blob Storage to the compute cluster automatically for you, passing the mount point into `--data_path`. Compared to the previous job, there is no on the fly data download.
294+
- The `inputs=my_job_inputs` used in the control script resolves to the mount point.
320295

321296
## Clean up resources
322297

@@ -331,7 +306,6 @@ If you're not going to use it now, stop the compute instance:
331306
1. Select the compute instance in the list.
332307
1. On the top toolbar, select **Stop**.
333308

334-
335309
### Delete all resources
336310

337311
[!INCLUDE [aml-delete-resource-group](../../includes/aml-delete-resource-group.md)]

0 commit comments

Comments
 (0)