Merge pull request #78688 from nibaccam/tutorials

PRMerger6 · web-flow · commit e6e9e7e84509 · 2019-06-05T04:01:10.000+08:00
Image Tutorial: bump and code updates
diff --git a/articles/machine-learning/service/tutorial-deploy-models-with-aml.md b/articles/machine-learning/service/tutorial-deploy-models-with-aml.md
@@ -32,7 +32,7 @@ In this part of the tutorial, you use Azure Machine Learning service for the fol
 Container Instances is a great solution for testing and understanding the workflow. For scalable production deployments, consider using Azure Kubernetes Service. For more information, see [how to deploy and where](how-to-deploy-and-where.md).
 
 >[!NOTE]
-> Code in this article was tested with Azure Machine Learning SDK version 1.0.8.
+> Code in this article was tested with Azure Machine Learning SDK version 1.0.41.
 
 ## Prerequisites
 Skip to [Set the development environment](#start) to read through the notebook steps.  
diff --git a/articles/machine-learning/service/tutorial-train-models-with-aml.md b/articles/machine-learning/service/tutorial-train-models-with-aml.md
@@ -18,7 +18,7 @@ ms.custom: seodec18
 
 In this tutorial, you train a machine learning model on remote compute resources. You'll use the training and deployment workflow for Azure Machine Learning service (preview) in a Python Jupyter notebook.  You can then use the notebook as a template to train your own machine learning model with your own data. This tutorial is **part one of a two-part tutorial series**.  
 
-This tutorial trains a simple logistic regression by using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](https://scikit-learn.org) with Azure Machine Learning service. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28 x 28 pixels, representing a number from zero to nine. The goal is to create a multiclass classifier to identify the digit a given image represents. 
+This tutorial trains a simple logistic regression by using the [MNIST](http://yann.lecun.com/exdb/mnist/) dataset and [scikit-learn](https://scikit-learn.org) with Azure Machine Learning service. MNIST is a popular dataset consisting of 70,000 grayscale images. Each image is a handwritten digit of 28 x 28 pixels, representing a number from zero to nine. The goal is to create a multiclass classifier to identify the digit a given image represents.
 
 Learn how to take the following actions:
 
@@ -28,12 +28,12 @@ Learn how to take the following actions:
 > * Train a simple logistic regression model on a remote cluster.
 > * Review training results and register the best model.
 
-You learn how to select a model and deploy it in [part two of this tutorial](tutorial-deploy-models-with-aml.md). 
+You learn how to select a model and deploy it in [part two of this tutorial](tutorial-deploy-models-with-aml.md).
 
 If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today.
 
 >[!NOTE]
-> Code in this article was tested with Azure Machine Learning SDK version 1.0.8.
+> Code in this article was tested with Azure Machine Learning SDK version 1.0.41.
 
 ## Prerequisites
 
@@ -47,8 +47,8 @@ Skip to [Set up your development environment](#start) to read through the notebo
 * The configuration file for the workspace in the same directory as the notebook
 
 Get all these prerequisites from either of the sections below.
- 
-* Use a [cloud notebook server in your workspace](#azure) 
+
+* Use a [cloud notebook server in your workspace](#azure)
 * Use [your own notebook server](#server)
 
 ### <a name="azure"></a>Use a cloud notebook server in your workspace
@@ -59,7 +59,6 @@ It's easy to get started with your own cloud-based notebook server. The [Azure M
 
 * After you launch the notebook webpage, open the **tutorials/img-classification-part1-training.ipynb** notebook.
 
-
 ### <a name="server"></a>Use your own Jupyter notebook server
 
 [!INCLUDE [aml-your-server](../../../includes/aml-your-server.md)]
@@ -103,7 +102,7 @@ print(ws.name, ws.location, ws.resource_group, ws.location, sep = '\t')
 
 ### Create an experiment
 
-Create an experiment to track the runs in your workspace. A workspace can have multiple experiments: 
+Create an experiment to track the runs in your workspace. A workspace can have multiple experiments:
 
 ```python
 experiment_name = 'sklearn-mnist'
@@ -118,7 +117,6 @@ By using Azure Machine Learning Compute, a managed service, data scientists can
 
  **Creation of the compute takes about five minutes.** If the compute is already in the workspace, the code uses it and skips the creation process.
 
-
 ```python
 from azureml.core.compute import AmlCompute
 from azureml.core.compute import ComputeTarget
@@ -140,21 +138,21 @@ if compute_name in ws.compute_targets:
 else:
     print('creating a new compute target...')
     provisioning_config = AmlCompute.provisioning_configuration(vm_size = vm_size,
-                                                                min_nodes = compute_min_nodes, 
+                                                                min_nodes = compute_min_nodes,
                                                                 max_nodes = compute_max_nodes)
 
     # create the cluster
     compute_target = ComputeTarget.create(ws, compute_name, provisioning_config)
-    
-    # can poll for a minimum number of nodes and for a specific timeout. 
+
+    # can poll for a minimum number of nodes and for a specific timeout.
     # if no min node count is provided it will use the scale settings for the cluster
     compute_target.wait_for_completion(show_output=True, min_node_count=None, timeout_in_minutes=20)
-    
+
      # For a more detailed view of current AmlCompute status, use get_status()
     print(compute_target.get_status().serialize())
 ```
 
-You now have the necessary packages and compute resources to train a model in the cloud. 
+You now have the necessary packages and compute resources to train a model in the cloud.
 
 ## Explore data
 
@@ -168,7 +166,6 @@ Before you train a model, you need to understand the data that you use to train
 
 Download the MNIST dataset and save the files into a `data` directory locally. Images and labels for both training and testing are downloaded:
 
-
 ```python
 import urllib.request
 import os
@@ -181,15 +178,14 @@ urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-u
 urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename=os.path.join(data_folder, 'test-images.gz'))
 urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename=os.path.join(data_folder, 'test-labels.gz'))
 ```
+
 You will see output similar to this:
 ```('./data/test-labels.gz', <http.client.HTTPMessage at 0x7f40864c77b8>)```
 
 ### Display some sample images
 
 Load the compressed files into `numpy` arrays. Then use `matplotlib` to plot 30 random images from the dataset with their labels above them. This step requires a `load_data` function that's included in an `util.py` file. This file is included in the sample folder. Make sure it's placed in the same folder as this notebook. The `load_data` function simply parses the compressed files into numpy arrays:
 
-
-
 ```python
 # make sure utils.py is in the same directory as this code
 from utils import load_data
@@ -232,16 +228,16 @@ print(ds.datastore_type, ds.account_name, ds.container_name)
 
 ds.upload(src_dir=data_folder, target_path='mnist', overwrite=True, show_progress=True)
 ```
-You now have everything you need to start training a model. 
 
+You now have everything you need to start training a model.
 
 ## Train on a remote cluster
 
 For this task, submit the job to the remote training cluster you set up earlier.  To submit a job you:
 * Create a directory
 * Create a training script
 * Create an estimator object
-* Submit the job 
+* Submit the job
 
 ### Create a directory
 
@@ -291,7 +287,7 @@ print(X_train.shape, y_train.shape, X_test.shape, y_test.shape, sep = '\n')
 run = Run.get_context()
 
 print('Train a logistic regression model with regularization rate of', args.reg)
-clf = LogisticRegression(C=1.0/args.reg, random_state=42)
+clf = LogisticRegression(C=1.0/args.reg, solver="liblinear", multi_class="auto", random_state=42)
 clf.fit(X_train, y_train)
 
 print('Predict the test set')
@@ -323,36 +319,32 @@ Notice how the script gets data and saves models:
   shutil.copy('utils.py', script_folder)
   ```
 
-
 ### Create an estimator
 
-An estimator object is used to submit the run. Create your estimator by running the following code to define these items:
+An [SKLearn estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.sklearn.sklearn?view=azure-ml-py) object is used to submit the run. Create your estimator by running the following code to define these items:
 
 * The name of the estimator object, `est`.
-* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. 
+* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution.
 * The compute target. In this case, you use the Azure Machine Learning compute cluster you created.
 * The training script name, **train.py**.
-* Parameters required from the training script. 
-* Python packages needed for training.
+* Parameters required from the training script.
 
 In this tutorial, this target is AmlCompute. All files in the script folder are uploaded into the cluster nodes for run. The **data_folder** is set to use the datastore, `ds.path('mnist').as_mount()`:
 
 ```python
-from azureml.train.estimator import Estimator
+from azureml.train.sklearn import SKLearn
 
 script_params = {
     '--data-folder': ds.path('mnist').as_mount(),
-    '--regularization': 0.8
+    '--regularization': 0.5
 }
 
-est = Estimator(source_directory=script_folder,
+est = SKLearn(source_directory=script_folder,
                 script_params=script_params,
                 compute_target=compute_target,
-                entry_script='train.py',
-                conda_packages=['scikit-learn'])
+                entry_script='train.py')
 ```
 
-
 ### Submit the job to the cluster
 
 Run the experiment by submitting the estimator object:
@@ -370,7 +362,7 @@ In total, the first run takes **about 10 minutes**. But for subsequent runs, as
 
 What happens while you wait:
 
-- **Image creation**: A Docker image is created that matches the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about five minutes**. 
+- **Image creation**: A Docker image is created that matches the Python environment specified by the estimator. The image is uploaded to the workspace. Image creation and uploading takes **about five minutes**.
 
   This stage happens once for each Python environment because the container is cached for subsequent runs. During image creation, logs are streamed to the run history. You can monitor the image creation progress by using these logs.
 
@@ -380,29 +372,26 @@ What happens while you wait:
 
 - **Post-processing**: The **./outputs** directory of the run is copied over to the run history in your workspace, so you can access these results.
 
-
-You can check the progress of a running job in several ways. This tutorial uses a Jupyter widget and a `wait_for_completion` method. 
+You can check the progress of a running job in several ways. This tutorial uses a Jupyter widget and a `wait_for_completion` method.
 
 ### Jupyter widget
 
 Watch the progress of the run with a Jupyter widget. Like the run submission, the widget is asynchronous and provides live updates every 10 to 15 seconds until the job finishes:
 
-
 ```python
 from azureml.widgets import RunDetails
 RunDetails(run).show()
 ```
 
-This still snapshot is the widget shown at the end of training:
+The widget will look like the following at the end of training:
 
 ![Notebook widget](./media/tutorial-train-models-with-aml/widget.png)
 
 If you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).
 
 ### Get log results upon completion
 
-Model training and monitoring happen in the background. Wait until the model has finished training before you run more code. Use `wait_for_completion` to show when the model training is finished: 
-
+Model training and monitoring happen in the background. Wait until the model has finished training before you run more code. Use `wait_for_completion` to show when the model training is finished:
 
 ```python
 run.wait_for_completion(show_output=False) # specify True for a verbose log
@@ -415,6 +404,7 @@ You now have a model trained on a remote cluster. Retrieve the accuracy of the m
 ```python
 print(run.get_metrics())
 ```
+
 The output shows the remote model has accuracy of 0.9204:
 
 `{'regularization rate': 0.8, 'accuracy': 0.9204}`
@@ -434,7 +424,7 @@ print(run.get_file_names())
 Register the model in the workspace, so that you or other collaborators can later query, examine, and deploy this model:
 
 ```python
-# register model 
+# register model
 model = run.register_model(model_name='sklearn_mnist', model_path='outputs/sklearn_mnist_model.pkl')
 print(model.name, model.id, model.version, sep = '\t')
 ```
@@ -445,7 +435,6 @@ print(model.name, model.id, model.version, sep = '\t')
 
 You can also delete just the Azure Machine Learning Compute cluster. However, autoscale is turned on, and the cluster minimum is zero. So this particular resource won't incur additional compute charges when not in use:
 
-
 ```python
 # optionally, delete the Azure Machine Learning Compute cluster
 compute_target.delete()