Skip to content

Commit 0b9d3a6

Browse files
authored
Merge pull request #58190 from sdgilley/sdg-m1
sync with notebook
2 parents ac8b795 + 2057a2c commit 0b9d3a6

File tree

1 file changed

+35
-33
lines changed

1 file changed

+35
-33
lines changed

articles/machine-learning/service/tutorial-train-models-with-aml.md

Lines changed: 35 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ ms.topic: tutorial
99
author: hning86
1010
ms.author: haining
1111
ms.reviewer: sgilley
12-
ms.date: 09/24/2018
12+
ms.date: 11/16/2018
1313
#Customer intent: As a professional data scientist, I can build an image classification model with Azure Machine Learning using Python in a Jupyter notebook.
1414
---
1515

@@ -39,7 +39,7 @@ For your convenience, this tutorial is available as a [Jupyter notebook](https:/
3939
[!INCLUDE [aml-clone-in-azure-notebook](../../../includes/aml-clone-in-azure-notebook.md)]
4040

4141
>[!NOTE]
42-
> This tutorial was tested with Azure Machine Learning SDK version 0.168
42+
> This tutorial was tested with Azure Machine Learning SDK version 0.1.74
4343
4444
## Set up your development environment
4545

@@ -90,41 +90,44 @@ exp = Experiment(workspace=ws, name=experiment_name)
9090

9191
### Create remote compute target
9292

93-
Azure Batch AI is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create an Azure Batch AI cluster as your training environment. This code creates a cluster for you if it does not already exist in your workspace.
93+
Azure Azure ML Managed Compute is a managed service that enables data scientists to train machine learning models on clusters of Azure virtual machines, including VMs with GPU support. In this tutorial, you create an Azure Managed Compute cluster as your training environment. This code creates a cluster for you if it does not already exist in your workspace.
9494

9595
**Creation of the cluster takes approximately 5 minutes.** If the cluster is already in the workspace this code uses it and skips the creation process.
9696

9797

9898
```python
99-
from azureml.core.compute import ComputeTarget, BatchAiCompute
100-
from azureml.core.compute_target import ComputeTargetException
99+
from azureml.core.compute import BatchAiCompute
100+
from azureml.core.compute import ComputeTarget
101+
import os
101102

102103
# choose a name for your cluster
103-
batchai_cluster_name = "traincluster"
104-
105-
try:
106-
# look for the existing cluster by name
107-
compute_target = ComputeTarget(workspace=ws, name=batchai_cluster_name)
108-
if type(compute_target) is BatchAiCompute:
109-
print('found compute target {}, just use it.'.format(batchai_cluster_name))
110-
else:
111-
print('{} exists but it is not a Batch AI cluster. Please choose a different name.'.format(batchai_cluster_name))
112-
except ComputeTargetException:
104+
batchai_cluster_name = os.environ.get("BATCHAI_CLUSTER_NAME", ws.name + "gpu")
105+
cluster_min_nodes = os.environ.get("BATCHAI_CLUSTER_MIN_NODES", 1)
106+
cluster_max_nodes = os.environ.get("BATCHAI_CLUSTER_MAX_NODES", 3)
107+
vm_size = os.environ.get("BATCHAI_CLUSTER_SKU", "STANDARD_NC6")
108+
autoscale_enabled = os.environ.get("BATCHAI_CLUSTER_AUTOSCALE_ENABLED", True)
109+
110+
111+
if batchai_cluster_name in ws.compute_targets:
112+
compute_target = ws.compute_targets[batchai_cluster_name]
113+
if compute_target and type(compute_target) is BatchAiCompute:
114+
print('found compute target. just use it. ' + batchai_cluster_name)
115+
else:
113116
print('creating a new compute target...')
114-
compute_config = BatchAiCompute.provisioning_configuration(vm_size="STANDARD_D2_V2", # small CPU-based VM
115-
#vm_priority='lowpriority', # optional
116-
autoscale_enabled=True,
117-
cluster_min_nodes=0,
118-
cluster_max_nodes=4)
117+
provisioning_config = BatchAiCompute.provisioning_configuration(vm_size = vm_size, # NC6 is GPU-enabled
118+
vm_priority = 'lowpriority', # optional
119+
autoscale_enabled = autoscale_enabled,
120+
cluster_min_nodes = cluster_min_nodes,
121+
cluster_max_nodes = cluster_max_nodes)
119122

120123
# create the cluster
121-
compute_target = ComputeTarget.create(ws, batchai_cluster_name, compute_config)
124+
compute_target = ComputeTarget.create(ws, batchai_cluster_name, provisioning_config)
122125

123126
# can poll for a minimum number of nodes and for a specific timeout.
124-
# if no min node count is provided it uses the scale settings for the cluster
127+
# if no min node count is provided it will use the scale settings for the cluster
125128
compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
126129

127-
# Use the 'status' property to get a detailed status for the current cluster.
130+
# For a more detailed view of current BatchAI cluster status, use the 'status' property
128131
print(compute_target.status.serialize())
129132
```
130133

@@ -140,7 +143,7 @@ Before you train a model, you need to understand the data that you are using to
140143

141144
### Download the MNIST dataset
142145

143-
Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded.
146+
Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded.
144147

145148

146149
```python
@@ -157,7 +160,7 @@ urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ub
157160

158161
### Display some sample images
159162

160-
Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in the `util.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function parses the compresse files into numpy arrays.
163+
Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. Note this step requires a `load_data` function that's included in an `util.py` file. This file is included in the sample folder. Please make sure it is placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays.
161164

162165

163166

@@ -206,9 +209,9 @@ ds.upload(src_dir='./data', target_path='mnist', overwrite=True, show_progress=T
206209
```
207210
You now have everything you need to start training a model.
208211

209-
## Train a model locally
212+
## Train a local model
210213

211-
Train a simple logistic regression model from scikit-learn locally.
214+
Train a simple logistic regression model using scikit-learn locally.
212215

213216
**Training locally can take a minute or two** depending on your computer configuration.
214217

@@ -240,7 +243,7 @@ Now you can expand on this simple model by building a model with a different reg
240243
For this task, submit the job to the remote training cluster you set up earlier. To submit a job you:
241244
* Create a directory
242245
* Create a training script
243-
* Create an estimator
246+
* Create an estimator object
244247
* Submit the job
245248

246249
### Create a directory
@@ -312,11 +315,10 @@ joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')
312315
Notice how the script gets data and saves models:
313316

314317
+ The training script reads an argument to find the directory containing the data. When you submit the job later, you point to the datastore for this argument:
315-
`parser.add_argument('--data-folder', type = str, dest = 'data_folder', help = 'data directory mounting point')`
316-
318+
`parser.add_argument('--data-folder', type=str, dest='data_folder', help='data directory mounting point')`
317319

318320
+ The training script saves your model into a directory named outputs. <br/>
319-
`joblib.dump(value = clf, filename = 'outputs/sklearn_mnist_model.pkl')`<br/>
321+
`joblib.dump(value=clf, filename='outputs/sklearn_mnist_model.pkl')`<br/>
320322
Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial.
321323

322324
The file `utils.py` is referenced from the training script to load the dataset correctly. Copy this script into the script folder so that it can be accessed along with the training script on the remote resource.
@@ -339,7 +341,7 @@ An estimator object is used to submit the run. Create your estimator by running
339341
* Parameters required from the training script
340342
* Python packages needed for training
341343

342-
In this tutorial, this target is the Batch AI cluster. All files in the project directory are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`).
344+
In this tutorial, this target is the Batch AI cluster. All files in the script folder are uploaded into the cluster nodes for execution. The data_folder is set to use the datastore (`ds.as_mount()`).
343345

344346
```python
345347
from azureml.train.estimator import Estimator
@@ -421,7 +423,7 @@ The output shows the remote model has an accuracy slightly higher than the local
421423

422424
`{'regularization rate': 0.8, 'accuracy': 0.9204}`
423425

424-
In the deployment tutorial you will explore this model in more detail.
426+
In the next tutorial you will explore this model in more detail.
425427

426428
## Register model
427429

0 commit comments

Comments
 (0)