|
| 1 | +--- |
| 2 | +title: Train and register Chainer models |
| 3 | +titleSuffix: Azure Machine Learning service |
| 4 | +description: This article shows you how to train and register a Chainer model using Azure Machine Learning service. |
| 5 | +services: machine-learning |
| 6 | +ms.service: machine-learning |
| 7 | +ms.subservice: core |
| 8 | +ms.topic: conceptual |
| 9 | +ms.author: sgilley |
| 10 | +author: sdgilley |
| 11 | +ms.date: 06/15/2019 |
| 12 | +--- |
| 13 | + |
| 14 | +# Train and register Chainer models at scale with Azure Machine Learning service |
| 15 | + |
| 16 | +This article shows you how to train and register a Chainer model using Azure Machine Learning service. It uses the popular [MNIST dataset](http://yann.lecun.com/exdb/mnist/) to classify handwritten digits using a deep neural network (DNN) built using the [Chainer Python library](https://Chainer.org) running on top of [numpy](https://www.numpy.org/). |
| 17 | + |
| 18 | +Chainer is a high-level neural network API capable of running on top of other popular DNN frameworks to simplify development. With Azure Machine Learning service, you can rapidly scale out training jobs using elastic cloud compute resources. You can also track your training runs, version models, deploy models, and much more. |
| 19 | + |
| 20 | +Whether you're developing a Chainer model from the ground-up or you're bringing an existing model into the cloud, Azure Machine Learning service can help you build production-ready models. |
| 21 | + |
| 22 | +If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today. |
| 23 | + |
| 24 | +## Prerequisites |
| 25 | + |
| 26 | +Run this code on either of these environments: |
| 27 | + |
| 28 | +- Azure Machine Learning Notebook VM - no downloads or installation necessary |
| 29 | + |
| 30 | + - Complete the [cloud-based notebook quickstart](quickstart-run-cloud-notebook.md) to create a dedicated notebook server pre-loaded with the SDK and the sample repository. |
| 31 | + - In the samples folder on the notebook server, find find a completed notebook and files in the **how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer** folder. The notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment, and notebook widgets. |
| 32 | + |
| 33 | +- Your own Jupyter Notebook server |
| 34 | + |
| 35 | + - [Install the Azure Machine Learning SDK for Python](setup-create-workspace.md#sdk) |
| 36 | + - [Create a workspace configuration file](setup-create-workspace.md#write-a-configuration-file) |
| 37 | + - Download the sample script file [chainer_mnist.py](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist.py) |
| 38 | + - You can also find a completed [Jupyter Notebook version](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb) of this guide on GitHub samples page. The notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment, and notebook widgets. |
| 39 | + |
| 40 | +## Set up the experiment |
| 41 | + |
| 42 | +This section sets up the training experiment by loading the required python packages, initializing a workspace, creating an experiment, and uploading the training data and training scripts. |
| 43 | + |
| 44 | +### Import packages |
| 45 | + |
| 46 | +First, import the azureml.core Python library ad display the version number. |
| 47 | + |
| 48 | +``` |
| 49 | +# Check core SDK version number |
| 50 | +import azureml.core |
| 51 | +
|
| 52 | +print("SDK version:", azureml.core.VERSION) |
| 53 | +``` |
| 54 | + |
| 55 | +### Initialize a workspace |
| 56 | + |
| 57 | +The [Azure Machine Learning service workspace](concept-workspace.md) is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a [`workspace`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) object. |
| 58 | + |
| 59 | +Create a workspace object from the `config.json` file created in the [prerequisites section](#prerequisites). |
| 60 | + |
| 61 | +```Python |
| 62 | +ws = Workspace.from_config() |
| 63 | +``` |
| 64 | + |
| 65 | +### Create a project directory |
| 66 | +Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on. |
| 67 | + |
| 68 | +``` |
| 69 | +import os |
| 70 | +
|
| 71 | +project_folder = './chainer-mnist' |
| 72 | +os.makedirs(project_folder, exist_ok=True) |
| 73 | +``` |
| 74 | + |
| 75 | +### Prepare training script |
| 76 | + |
| 77 | +In this tutorial, the training script **chainer_mnist.py** is already provided for you. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code. |
| 78 | + |
| 79 | +To use Azure ML's tracking and metrics capabilities, you will have to add a small amount of Azure ML code inside your training script. The training script **chainer_mnist.py** shows how to log some metrics to your Azure ML run. To do so, you access the Azure ML `Run` object within the script. |
| 80 | + |
| 81 | +Copy the training script **chainer_mnist.py** into your project directory. |
| 82 | + |
| 83 | +``` |
| 84 | +import shutil |
| 85 | +
|
| 86 | +shutil.copy('chainer_mnist.py', project_folder) |
| 87 | +``` |
| 88 | + |
| 89 | +### Create an experiment |
| 90 | + |
| 91 | +Create an experiment and a folder to hold your training scripts. In this example, create an experiment called "chainer-mnist". |
| 92 | + |
| 93 | +``` |
| 94 | +from azureml.core import Experiment |
| 95 | +
|
| 96 | +experiment_name = 'chainer-mnist' |
| 97 | +experiment = Experiment(ws, name=experiment_name) |
| 98 | +``` |
| 99 | + |
| 100 | + |
| 101 | +## Create or get a compute target |
| 102 | + |
| 103 | +You will need a [compute target](concept-compute-target.md) for training your model. In this tutorial, you will use Azure ML managed compute (AmlCompute) for your remote training compute resource. |
| 104 | + |
| 105 | +**Creation of AmlCompute takes approximately 5 minutes**. If the AmlCompute with that name is already in your workspace, this code will skip the creation process. |
| 106 | + |
| 107 | +```Python |
| 108 | +from azureml.core.compute import ComputeTarget, AmlCompute |
| 109 | +from azureml.core.compute_target import ComputeTargetException |
| 110 | + |
| 111 | +# choose a name for your cluster |
| 112 | +cluster_name = "gpu-cluster" |
| 113 | + |
| 114 | +try: |
| 115 | + compute_target = ComputeTarget(workspace=ws, name=cluster_name) |
| 116 | + print('Found existing compute target.') |
| 117 | +except ComputeTargetException: |
| 118 | + print('Creating a new compute target...') |
| 119 | + compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6', |
| 120 | + max_nodes=4) |
| 121 | + |
| 122 | + # create the cluster |
| 123 | + compute_target = ComputeTarget.create(ws, cluster_name, compute_config) |
| 124 | + |
| 125 | + compute_target.wait_for_completion(show_output=True) |
| 126 | + |
| 127 | +# use get_status() to get a detailed status for the current cluster. |
| 128 | +print(compute_target.get_status().serialize()) |
| 129 | +``` |
| 130 | + |
| 131 | +For more information on compute targets, see the [what is a compute target](concept-compute-target.md) article. |
| 132 | + |
| 133 | +## Create a Chainer estimator |
| 134 | + |
| 135 | +The [Chainer estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.chainer?view=azure-ml-py) provides a simple way of launching Chainer training jobs on your compute target. |
| 136 | + |
| 137 | +The Chainer estimator is implemented through the generic [`estimator`](https://docs.microsoft.com//python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class, which can be used to support any framework. For more information about training models using the generic estimator, see [train models with Azure Machine Learning using estimator](how-to-train-ml-models.md) |
| 138 | + |
| 139 | +```Python |
| 140 | +from azureml.train.dnn import Chainer |
| 141 | + |
| 142 | +script_params = { |
| 143 | + '--epochs': 10, |
| 144 | + '--batchsize': 128, |
| 145 | + '--output_dir': './outputs' |
| 146 | +} |
| 147 | + |
| 148 | +estimator = Chainer(source_directory=project_folder, |
| 149 | + script_params=script_params, |
| 150 | + compute_target=compute_target, |
| 151 | + pip_packages=['numpy', 'pytest'], |
| 152 | + entry_script='chainer_mnist.py', |
| 153 | + use_gpu=True) |
| 154 | +``` |
| 155 | + |
| 156 | +## Submit a run |
| 157 | + |
| 158 | +The [Run object](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run%28class%29?view=azure-ml-py) provides the interface to the run history while the job is running and after it has completed. |
| 159 | + |
| 160 | +```Python |
| 161 | +run = exp.submit(est) |
| 162 | +run.wait_for_completion(show_output=True) |
| 163 | +``` |
| 164 | + |
| 165 | +As the Run is executed, it goes through the following stages: |
| 166 | + |
| 167 | +- **Preparing**: A docker image is created according to the TensorFlow estimator. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress. |
| 168 | + |
| 169 | +- **Scaling**: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available. |
| 170 | + |
| 171 | +- **Running**: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run. |
| 172 | + |
| 173 | +- **Post-Processing**: The ./outputs folder of the run is copied over to the run history. |
| 174 | + |
| 175 | + |
| 176 | +## Next steps |
| 177 | + |
| 178 | +In this article, you trained a Chainer model on Azure Machine Learning service. |
| 179 | + |
| 180 | +* To learn how to deploy a model, continue on to our [model deployment](how-to-deploy-and-where.md) article. |
| 181 | + |
| 182 | +* [Tune hyperparameters](how-to-tune-hyperparameters.md) |
| 183 | + |
| 184 | +* [Track run metrics during training](how-to-track-experiments.md) |
0 commit comments