You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this tutorial, you run your first Python script in the cloud with Azure Machine Learning. This tutorial is *part 1 of a two-part tutorial series*.
22
22
23
-
This tutorial avoids the complexity of training a machine learning model. You will run a "Hello World" Python script in the cloud. You will learn how a control script is used to configure and create a run in Azure Machine Learning.
23
+
This tutorial avoids the complexity of training a machine learning model. You'll run a "Hello World" Python script in the cloud. You'll learn how a control script is used to configure and create a run in Azure Machine Learning.
24
24
25
25
In this tutorial, you will:
26
26
@@ -38,7 +38,7 @@ In this tutorial, you will:
38
38
39
39
## Create and run a Python script
40
40
41
-
This tutorial will use the compute instance as your development computer. First create a few folders and the script:
41
+
This tutorial uses the compute instance as your development computer. First create a few folders and the script:
42
42
43
43
1. Sign in to the [Azure Machine Learning studio](https://ml.azure.com) and select your workspace if prompted.
44
44
1. On the left, select **Notebooks**
@@ -47,9 +47,9 @@ This tutorial will use the compute instance as your development computer. First
47
47
1. Name the folder **get-started**.
48
48
1. To the right of the folder name, use the **...** to create another folder under **get-started**.
49
49
:::image type="content" source="../media/tutorial-1st-experiment-hello-world/create-sub-folder.png" alt-text="Screenshot shows create a subfolder menu.":::
50
-
1. Name the new folder **src**. Use the **Edit location** link if the file location is not correct.
50
+
1. Name the new folder **src**. Use the **Edit location** link if the file location isn't correct.
51
51
1. To the right of the **src** folder, use the **...** to create a new file in the **src** folder.
52
-
1. Name your file *hello.py*. Switch the **File type** to *Python (*.py)*.
52
+
1. Name your file *hello.py*. Switch the **File type** to *Python (*.py)*.
53
53
54
54
Copy this code into your file:
55
55
@@ -65,7 +65,7 @@ Your project folder structure will now look like:
65
65
66
66
### Test your script
67
67
68
-
You can run your code locally, which in this case means on the compute instance. Running code locally has the benefit of interactive debugging of code.
68
+
You can run your code locally, which in this case means on the compute instance. Running code locally has the benefit of interactive debugging of code.
69
69
70
70
If you have previously stopped your compute instance, start it now with the **Start compute** tool to the right of the compute dropdown. Wait about a minute for state to change to *Running*.
71
71
@@ -75,13 +75,13 @@ Select **Save and run script in terminal** to run the script.
75
75
76
76
:::image type="content" source="../media/tutorial-1st-experiment-hello-world/save-run-in-terminal.png" alt-text="Screenshot shows save and run script in terminal tool in the toolbar":::
77
77
78
-
You'll see the output of the script in the terminal window that opens. Close the tab and select **Terminate** to close the session.
78
+
You see the output of the script in the terminal window that opens. Close the tab and select **Terminate** to close the session.
79
79
80
80
## Create a control script
81
81
82
-
A *control script* allows you to run your `hello.py` script on different compute resources. You use the control script to control how and where your machine learning code is run.
82
+
A *control script* allows you to run your `hello.py` script on different compute resources. You use the control script to control how and where your machine learning code is run.
83
83
84
-
Select the **...** at the end of **get-started** folder to create a new file. Create a Python file called *run-hello.py* and copy/paste the following code into that file:
84
+
Select the **...** at the end of **get-started** folder to create a new file. Create a Python file called *run-hello.py* and copy/paste the following code into that file:
85
85
86
86
```python
87
87
# get-started/run-hello.py
@@ -125,23 +125,23 @@ Here's a description of how the control script works:
125
125
`config = ScriptRunConfig( ... )`
126
126
:::column-end:::
127
127
:::column span="2":::
128
-
[ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) wraps your `hello.py` code and passes it to your workspace. As the name suggests, you can use this class to _configure_ how you want your _script_ to _run_ in Azure Machine Learning. It also specifies what compute target the script will run on. In this code, the target is the compute cluster that you created in the [setup tutorial](../quickstart-create-resources.md).
128
+
[ScriptRunConfig](/python/api/azureml-core/azureml.core.scriptrunconfig) wraps your `hello.py` code and passes it to your workspace. As the name suggests, you can use this class to _configure_ how you want your _script_ to _run_ in Azure Machine Learning. It also specifies what compute target the script runs on. In this code, the target is the compute cluster that you created in the [setup tutorial](../quickstart-create-resources.md).
129
129
:::column-end:::
130
130
:::row-end:::
131
131
:::row:::
132
132
:::column span="":::
133
133
`run = experiment.submit(config)`
134
134
:::column-end:::
135
135
:::column span="2":::
136
-
Submits your script. This submission is called a [run](/python/api/azureml-core/azureml.core.run%28class%29). In v2, it has been renamed to a job. A run/job encapsulates a single execution of your code. Use a job to monitor the script progress, capture the output, analyze the results, visualize metrics, and more.
136
+
Submits your script. This submission is called a [run](/python/api/azureml-core/azureml.core.run%28class%29). In v2, it has been renamed to a job. A run/job encapsulates a single execution of your code. Use a job to monitor the script progress, capture the output, analyze the results, visualize metrics, and more.
137
137
:::column-end:::
138
138
:::row-end:::
139
139
:::row:::
140
140
:::column span="":::
141
141
`aml_url = run.get_portal_url()`
142
142
:::column-end:::
143
143
:::column span="2":::
144
-
The `run` object provides a handle on the execution of your code. Monitor its progress from the Azure Machine Learning studio with the URL that's printed from the Python script.
144
+
The `run` object provides a handle on the execution of your code. Monitor its progress from the Azure Machine Learning studio with the URL that prints from the Python script.
145
145
:::column-end:::
146
146
:::row-end:::
147
147
@@ -150,33 +150,33 @@ Here's a description of how the control script works:
150
150
151
151
1. Select **Save and run script in terminal** to run your control script, which in turn runs `hello.py` on the compute cluster that you created in the [setup tutorial](../quickstart-create-resources.md).
152
152
153
-
1. In the terminal, you may be asked to sign in to authenticate. Copy the code and follow the link to complete this step.
153
+
1. In the terminal, you may be asked to sign in to authenticate. Copy the code and follow the link to complete this step.
154
154
155
-
1. Once you're authenticated, you'll see a link in the terminal. Select the link to view the job.
155
+
1. Once you're authenticated, you see a link in the terminal. Select the link to view the job.
156
156
157
157
> [!NOTE]
158
158
> You may see some warnings starting with *Failure while loading azureml_run_type_providers...*. You can ignore these warnings. Use the link at the bottom of these warnings to view your output.
159
159
160
160
## View the output
161
161
162
-
1. In the page that opens, you'll see the job status.
162
+
1. In the page that opens, you see the job status.
163
163
1. When the status of the job is **Completed**, select **Output + logs** at the top of the page.
164
164
1. Select **std_log.txt** to view the output of your job.
165
165
166
166
## Monitor your code in the cloud in the studio
167
167
168
-
The output from your script will contain a link to the studio that looks something like this:
168
+
The output from your script contains a link to the studio that looks something like this:
Follow the link. At first, you'll see a status of **Queued** or **Preparing**. The very first run will take 5-10 minutes to complete. This is because the following occurs:
171
+
Follow the link. At first, you see a status of **Queued** or **Preparing**. The first run takes 5-10 minutes to complete. This is because the following occurs:
172
172
173
173
* A docker image is built in the cloud
174
174
* The compute cluster is resized from 0 to 1 node
175
175
* The docker image is downloaded to the compute.
176
176
177
-
Subsequent jobs are much quicker (~15 seconds) as the docker image is cached on the compute. You can test this by resubmitting the code below after the first job has completed.
177
+
Subsequent jobs are quicker (~15 seconds) as the docker image is cached on the compute. You can test this by resubmitting the code below after the first job has completed.
178
178
179
-
Wait about 10 minutes. You'll see a message that the job has completed. Then use **Refresh** to see the status change to *Completed*. Once the job completes, go to the **Outputs + logs** tab. There you can see a `std_log.txt` file that looks like this:
179
+
Wait about 10 minutes. You see a message that the job has completed. Then use **Refresh** to see the status change to *Completed*. Once the job completes, go to the **Outputs + logs** tab. There you can see a `std_log.txt` file that looks like this:
This tutorial shows you how to train a machine learning model in Azure Machine Learning. This tutorial is *part 2 of a two-part tutorial series*.
21
+
This tutorial shows you how to train a machine learning model in Azure Machine Learning. This tutorial is *part 2 of a two-part tutorial series*.
22
22
23
-
In [Part 1: Run "Hello world!"](tutorial-1st-experiment-hello-world.md) of the series, you learned how to use a control script to run a job in the cloud.
23
+
In [Part 1: Run "Hello world!"](tutorial-1st-experiment-hello-world.md) of the series, you learned how to use a control script to run a job in the cloud.
24
24
25
-
In this tutorial, you take the next step by submitting a script that trains a machine learning model. This example will help you understand how Azure Machine Learning eases consistent behavior between local debugging and remote runs.
25
+
In this tutorial, you take the next step by submitting a script that trains a machine learning model. This example helps you understand how Azure Machine Learning eases consistent behavior between local debugging and remote runs.
26
26
27
27
In this tutorial, you:
28
28
@@ -42,9 +42,9 @@ In this tutorial, you:
42
42
43
43
## Create training scripts
44
44
45
-
First you define the neural network architecture in a *model.py* file. All your training code will go into the `src` subdirectory, including *model.py*.
45
+
First you define the neural network architecture in a *model.py* file. All your training code goes into the `src` subdirectory, including *model.py*.
46
46
47
-
The training code is taken from [this introductory example](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) from PyTorch. Note that the Azure Machine Learning concepts apply to any machine learning code, not just PyTorch.
47
+
The training code is taken from [this introductory example](https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) from PyTorch. The Azure Machine Learning concepts apply to any machine learning code, not just PyTorch.
48
48
49
49
1. Create a *model.py* file in the **src** subfolder. Copy this code into the file:
50
50
@@ -72,7 +72,7 @@ The training code is taken from [this introductory example](https://pytorch.org/
72
72
x =self.fc3(x)
73
73
return x
74
74
```
75
-
1. On the toolbar, select **Save** to save the file. Close the tab if you wish.
75
+
1. On the toolbar, select **Save** to save the file. Close the tab if you wish.
76
76
77
77
1. Next, define the training script, also in the **src** subfolder. This script downloads the CIFAR10 dataset by using PyTorch `torchvision.dataset` APIs, sets up the network defined in*model.py*, and trains it for two epochs by using standard SGDand cross-entropy loss.
78
78
@@ -143,15 +143,15 @@ The training code is taken from [this introductory example](https://pytorch.org/
143
143
144
144
Select **Save and run script in terminal** to run the *train.py* script directly on the compute instance.
145
145
146
-
After the script completes, select **Refresh** above the file folders. You'll see the new data folder called **get-started/data** Expand this folder to view the downloaded data.
146
+
After the script completes, select **Refresh** above the file folders. You see the new data folder called **get-started/data** Expand this folder to view the downloaded data.
147
147
148
148
:::image type="content" source="../media/tutorial-1st-experiment-hello-world/directory-with-data.png" alt-text="Screenshot of folders shows new data folder created by running the file locally.":::
149
149
150
150
## Create a Python environment
151
151
152
-
Azure Machine Learning provides the concept of an [environment](/python/api/azureml-core/azureml.core.environment.environment) to represent a reproducible, versioned Python environment for running experiments. It's easy to create an environment from a local Conda or pip environment.
152
+
Azure Machine Learning provides the concept of an [environment](/python/api/azureml-core/azureml.core.environment.environment) to represent a reproducible, versioned Python environment for running experiments. It's easy to create an environment from a local Conda or pip environment.
153
153
154
-
First you'll create a file with the package dependencies.
154
+
First you create a filewith the package dependencies.
155
155
156
156
1. Create a new filein the **get-started** folder called `pytorch-env.yml`:
157
157
@@ -165,7 +165,7 @@ First you'll create a file with the package dependencies.
165
165
- pytorch
166
166
- torchvision
167
167
```
168
-
1. On the toolbar, select **Save** to save the file. Close the tab if you wish.
168
+
1. On the toolbar, select **Save** to save the file. Close the tab if you wish.
169
169
170
170
## Create the control script
171
171
@@ -226,14 +226,14 @@ if __name__ == "__main__":
226
226
227
227
1. Select **Save and run script in terminal** to run the *run-pytorch.py* script.
228
228
229
-
1. You'll see a link in the terminal window that opens. Select the link to view the job.
229
+
1. You see a link in the terminal window that opens. Select the link to view the job.
230
230
231
231
> [!NOTE]
232
232
> You may see some warnings starting with *Failure while loading azureml_run_type_providers...*. You can ignore these warnings. Use the link at the bottom of these warnings to view your output.
233
233
234
234
### View the output
235
235
236
-
1. In the page that opens, you'll see the job status. The first time you run this script, Azure Machine Learning will build a new Docker image from your PyTorch environment. The whole job might take around 10 minutes to complete. This image will be reused in future jobs to make them run much quicker.
236
+
1. In the page that opens, you see the job status. The first time you run this script, Azure Machine Learning builds a new Docker image from your PyTorch environment. The whole job might take around 10 minutes to complete. This image will be reused in future jobs to make them run much quicker.
237
237
1. You can see view Docker build logs in the Azure Machine Learning studio. Select the **Outputs + logs** tab, and then select **20_image_build_log.txt**.
238
238
1. When the status of the job is **Completed**, select **Output + logs**.
239
239
1. Select **std_log.txt** to view the output of your job.
@@ -253,7 +253,7 @@ Finished Training
253
253
254
254
If you see an error `Your total snapshot size exceeds the limit`, the **data** folder is located in the `source_directory` value used in `ScriptRunConfig`.
255
255
256
-
Select the **...** at the end of the folder, then select **Move** to move **data** to the **get-started** folder.
256
+
Select the **...** at the end of the folder, then select **Move** to move **data** to the **get-started** folder.
257
257
258
258
259
259
## Log training metrics
@@ -367,9 +367,9 @@ Make sure you save this file before you submit the run.
367
367
368
368
### Submit the run to Azure Machine Learning
369
369
370
-
Select the tab for the *run-pytorch.py* script, then select **Save and run script in terminal** to re-run the *run-pytorch.py* script. Make sure you've saved your changes to `pytorch-env.yml` first.
370
+
Select the tab for the *run-pytorch.py* script, then select **Save and run script in terminal** to rerun the *run-pytorch.py* script. Make sure you save your changes to `pytorch-env.yml` first.
371
371
372
-
This time when you visit the studio, go to the **Metrics** tab where you can now see live updates on the model training loss! It may take a 1 to 2 minutes before the training begins.
372
+
This time when you visit the studio, go to the **Metrics** tab where you can now see live updates on the model training loss! It may take a 1 to 2 minutes before the training begins.
373
373
374
374
:::image type="content" source="../media/tutorial-1st-experiment-sdk-train/logging-metrics.png" alt-text="Training loss graph on the Metrics tab.":::
0 commit comments