Skip to content

Commit e6e9e7e

Browse files
authored
Merge pull request #78688 from nibaccam/tutorials
Image Tutorial: bump and code updates
2 parents 54c00e6 + 13f1be0 commit e6e9e7e

File tree

2 files changed

+29
-40
lines changed

2 files changed

+29
-40
lines changed

articles/machine-learning/service/tutorial-deploy-models-with-aml.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ In this part of the tutorial, you use Azure Machine Learning service for the fol
3232
Container Instances is a great solution for testing and understanding the workflow. For scalable production deployments, consider using Azure Kubernetes Service. For more information, see [how to deploy and where](how-to-deploy-and-where.md).
3333

3434
>[!NOTE]
35-
> Code in this article was tested with Azure Machine Learning SDK version 1.0.8.
35+
> Code in this article was tested with Azure Machine Learning SDK version 1.0.41.
3636
3737
## Prerequisites
3838
Skip to [Set the development environment](#start) to read through the notebook steps.

articles/machine-learning/service/tutorial-train-models-with-aml.md

Lines changed: 28 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ ms.custom: seodec18
1818

1919
In this tutorial, you train a machine learning model on remote compute resources. You'll use the training and deployment workflow for Azure Machine Learning service (preview) in a Python Jupyter notebook. You can then use the notebook as a template to train your own machine learning model with your own data. This tutorial is **part one of a two-part tutorial series**.
2020

21-
This tutorial trains a simple logistic regression by using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](https://scikit-learn.org) with Azure Machine Learning service. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28 x 28 pixels, representing a number from zero to nine. The goal is to create a multiclass classifier to identify the digit a given image represents.
21+
This tutorial trains a simple logistic regression by using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](https://scikit-learn.org) with Azure Machine Learning service. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28 x 28 pixels, representing a number from zero to nine. The goal is to create a multiclass classifier to identify the digit a given image represents.
2222

2323
Learn how to take the following actions:
2424

@@ -28,12 +28,12 @@ Learn how to take the following actions:
2828
> * Train a simple logistic regression model on a remote cluster.
2929
> * Review training results and register the best model.
3030
31-
You learn how to select a model and deploy it in [part two of this tutorial](tutorial-deploy-models-with-aml.md).
31+
You learn how to select a model and deploy it in [part two of this tutorial](tutorial-deploy-models-with-aml.md).
3232

3333
If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today.
3434

3535
>[!NOTE]
36-
> Code in this article was tested with Azure Machine Learning SDK version 1.0.8.
36+
> Code in this article was tested with Azure Machine Learning SDK version 1.0.41.
3737
3838
## Prerequisites
3939

@@ -47,8 +47,8 @@ Skip to [Set up your development environment](#start) to read through the notebo
4747
* The configuration file for the workspace in the same directory as the notebook
4848

4949
Get all these prerequisites from either of the sections below.
50-
51-
* Use a [cloud notebook server in your workspace](#azure)
50+
51+
* Use a [cloud notebook server in your workspace](#azure)
5252
* Use [your own notebook server](#server)
5353

5454
### <a name="azure"></a>Use a cloud notebook server in your workspace
@@ -59,7 +59,6 @@ It's easy to get started with your own cloud-based notebook server. The [Azure M
5959

6060
* After you launch the notebook webpage, open the **tutorials/img-classification-part1-training.ipynb** notebook.
6161

62-
6362
### <a name="server"></a>Use your own Jupyter notebook server
6463

6564
[!INCLUDE [aml-your-server](../../../includes/aml-your-server.md)]
@@ -103,7 +102,7 @@ print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')
103102

104103
### Create an experiment
105104

106-
Create an experiment to track the runs in your workspace. A workspace can have multiple experiments:
105+
Create an experiment to track the runs in your workspace. A workspace can have multiple experiments:
107106

108107
```python
109108
experiment_name = 'sklearn-mnist'
@@ -118,7 +117,6 @@ By using Azure Machine Learning Compute, a managed service, data scientists can
118117

119118
**Creation of the compute takes about five minutes.** If the compute is already in the workspace, the code uses it and skips the creation process.
120119

121-
122120
```python
123121
from azureml.core.compute import AmlCompute
124122
from azureml.core.compute import ComputeTarget
@@ -140,21 +138,21 @@ if compute_name in ws.compute_targets:
140138
else:
141139
print('creating a new compute target...')
142140
provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
143-
min_nodes = compute_min_nodes,
141+
min_nodes = compute_min_nodes,
144142
max_nodes = compute_max_nodes)
145143

146144
# create the cluster
147145
compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
148-
149-
# can poll for a minimum number of nodes and for a specific timeout.
146+
147+
# can poll for a minimum number of nodes and for a specific timeout.
150148
# if no min node count is provided it will use the scale settings for the cluster
151149
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
152-
150+
153151
# For a more detailed view of current AmlCompute status, use get_status()
154152
print(compute_target.get_status().serialize())
155153
```
156154

157-
You now have the necessary packages and compute resources to train a model in the cloud.
155+
You now have the necessary packages and compute resources to train a model in the cloud.
158156

159157
## Explore data
160158

@@ -168,7 +166,6 @@ Before you train a model, you need to understand the data that you use to train
168166

169167
Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded:
170168

171-
172169
```python
173170
import urllib.request
174171
import os
@@ -181,15 +178,14 @@ urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-u
181178
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))
182179
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))
183180
```
181+
184182
You will see output similar to this:
185183
```('./data/test-labels.gz', <http.client.HTTPMessage at 0x7f40864c77b8>)```
186184

187185
### Display some sample images
188186

189187
Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. This step requires a `load_data` function that's included in an `util.py` file. This file is included in the sample folder. Make sure it's placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays:
190188

191-
192-
193189
```python
194190
# make sure utils.py is in the same directory as this code
195191
from utils import load_data
@@ -232,16 +228,16 @@ print(ds.datastore_type, ds.account_name, ds.container_name)
232228

233229
ds.upload(src_dir=data_folder, target_path='mnist', overwrite=True, show_progress=True)
234230
```
235-
You now have everything you need to start training a model.
236231

232+
You now have everything you need to start training a model.
237233

238234
## Train on a remote cluster
239235

240236
For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:
241237
* Create a directory
242238
* Create a training script
243239
* Create an estimator object
244-
* Submit the job
240+
* Submit the job
245241

246242
### Create a directory
247243

@@ -291,7 +287,7 @@ print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\n')
291287
run = Run.get_context()
292288

293289
print('Train a logistic regression model with regularization rate of', args.reg)
294-
clf = LogisticRegression(C=1.0/args.reg, random_state=42)
290+
clf = LogisticRegression(C=1.0/args.reg, solver="liblinear", multi_class="auto", random_state=42)
295291
clf.fit(X_train, y_train)
296292

297293
print('Predict the test set')
@@ -323,36 +319,32 @@ Notice how the script gets data and saves models:
323319
shutil.copy('utils.py', script_folder)
324320
```
325321

326-
327322
### Create an estimator
328323

329-
An estimator object is used to submit the run. Create your estimator by running the following code to define these items:
324+
An [SKLearn estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py) object is used to submit the run. Create your estimator by running the following code to define these items:
330325

331326
* The name of the estimator object, `est`.
332-
* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution.
327+
* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution.
333328
* The compute target. In this case, you use the Azure Machine Learning compute cluster you created.
334329
* The training script name, **train.py**.
335-
* Parameters required from the training script.
336-
* Python packages needed for training.
330+
* Parameters required from the training script.
337331

338332
In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for run. The **data_folder** is set to use the datastore, `ds.path('mnist').as_mount()`:
339333

340334
```python
341-
from azureml.train.estimator import Estimator
335+
from azureml.train.sklearn import SKLearn
342336

343337
script_params = {
344338
'--data-folder': ds.path('mnist').as_mount(),
345-
'--regularization': 0.8
339+
'--regularization': 0.5
346340
}
347341

348-
est = Estimator(source_directory=script_folder,
342+
est = SKLearn(source_directory=script_folder,
349343
script_params=script_params,
350344
compute_target=compute_target,
351-
entry_script='train.py',
352-
conda_packages=['scikit-learn'])
345+
entry_script='train.py')
353346
```
354347

355-
356348
### Submit the job to the cluster
357349

358350
Run the experiment by submitting the estimator object:
@@ -370,7 +362,7 @@ In total, the first run takes **about 10 minutes**. But for subsequent runs, as
370362

371363
What happens while you wait:
372364

373-
- **Image creation**: A Docker image is created that matches the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about five minutes**.
365+
- **Image creation**: A Docker image is created that matches the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about five minutes**.
374366

375367
This stage happens once for each Python environment because the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress by using these logs.
376368

@@ -380,29 +372,26 @@ What happens while you wait:
380372

381373
- **Post-processing**: The **./outputs** directory of the run is copied over to the run history in your workspace, so you can access these results.
382374

383-
384-
You can check the progress of a running job in several ways. This tutorial uses a Jupyter widget and a `wait_for_completion` method.
375+
You can check the progress of a running job in several ways. This tutorial uses a Jupyter widget and a `wait_for_completion` method.
385376

386377
### Jupyter widget
387378

388379
Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10 to 15 seconds until the job finishes:
389380

390-
391381
```python
392382
from azureml.widgets import RunDetails
393383
RunDetails(run).show()
394384
```
395385

396-
This still snapshot is the widget shown at the end of training:
386+
The widget will look like the following at the end of training:
397387

398388
![Notebook widget](./media/tutorial-train-models-with-aml/widget.png)
399389

400390
If you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).
401391

402392
### Get log results upon completion
403393

404-
Model training and monitoring happen in the background. Wait until the model has finished training before you run more code. Use `wait_for_completion` to show when the model training is finished:
405-
394+
Model training and monitoring happen in the background. Wait until the model has finished training before you run more code. Use `wait_for_completion` to show when the model training is finished:
406395

407396
```python
408397
run.wait_for_completion(show_output=False) # specify True for a verbose log
@@ -415,6 +404,7 @@ You now have a model trained on a remote cluster. Retrieve the accuracy of the m
415404
```python
416405
print(run.get_metrics())
417406
```
407+
418408
The output shows the remote model has accuracy of 0.9204:
419409

420410
`{'regularization rate': 0.8, 'accuracy': 0.9204}`
@@ -434,7 +424,7 @@ print(run.get_file_names())
434424
Register the model in the workspace, so that you or other collaborators can later query, examine, and deploy this model:
435425

436426
```python
437-
# register model
427+
# register model
438428
model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')
439429
print(model.name, model.id, model.version, sep = '\t')
440430
```
@@ -445,7 +435,6 @@ print(model.name, model.id, model.version, sep = '\t')
445435

446436
You can also delete just the Azure Machine Learning Compute cluster. However, autoscale is turned on, and the cluster minimum is zero. So this particular resource won't incur additional compute charges when not in use:
447437

448-
449438
```python
450439
# optionally, delete the Azure Machine Learning Compute cluster
451440
compute_target.delete()

0 commit comments

Comments
 (0)