Skip to content

Commit 4491263

Browse files
authored
Merge pull request #79649 from sdgilley/sdg-master
new Chainer article
2 parents b4686a3 + 1f13045 commit 4491263

File tree

2 files changed

+186
-0
lines changed

2 files changed

+186
-0
lines changed
Lines changed: 184 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,184 @@
1+
---
2+
title: Train and register Chainer models
3+
titleSuffix: Azure Machine Learning service
4+
description: This article shows you how to train and register a Chainer model using Azure Machine Learning service.
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: core
8+
ms.topic: conceptual
9+
ms.author: sgilley
10+
author: sdgilley
11+
ms.date: 06/15/2019
12+
---
13+
14+
# Train and register Chainer models at scale with Azure Machine Learning service
15+
16+
This article shows you how to train and register a Chainer model using Azure Machine Learning service. It uses the popular [MNIST dataset](http://yann.lecun.com/exdb/mnist/) to classify handwritten digits using a deep neural network (DNN) built using the [Chainer Python library](https://Chainer.org) running on top of [numpy](https://www.numpy.org/).
17+
18+
Chainer is a high-level neural network API capable of running on top of other popular DNN frameworks to simplify development. With Azure Machine Learning service, you can rapidly scale out training jobs using elastic cloud compute resources. You can also track your training runs, version models, deploy models, and much more.
19+
20+
Whether you're developing a Chainer model from the ground-up or you're bringing an existing model into the cloud, Azure Machine Learning service can help you build production-ready models.
21+
22+
If you don’t have an Azure subscription, create a free account before you begin. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today.
23+
24+
## Prerequisites
25+
26+
Run this code on either of these environments:
27+
28+
- Azure Machine Learning Notebook VM - no downloads or installation necessary
29+
30+
- Complete the [cloud-based notebook quickstart](quickstart-run-cloud-notebook.md) to create a dedicated notebook server pre-loaded with the SDK and the sample repository.
31+
- In the samples folder on the notebook server, find find a completed notebook and files in the **how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer** folder. The notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment, and notebook widgets.
32+
33+
- Your own Jupyter Notebook server
34+
35+
- [Install the Azure Machine Learning SDK for Python](setup-create-workspace.md#sdk)
36+
- [Create a workspace configuration file](setup-create-workspace.md#write-a-configuration-file)
37+
- Download the sample script file [chainer_mnist.py](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/chainer_mnist.py)
38+
- You can also find a completed [Jupyter Notebook version](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-chainer/train-hyperparameter-tune-deploy-with-chainer.ipynb) of this guide on GitHub samples page. The notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment, and notebook widgets.
39+
40+
## Set up the experiment
41+
42+
This section sets up the training experiment by loading the required python packages, initializing a workspace, creating an experiment, and uploading the training data and training scripts.
43+
44+
### Import packages
45+
46+
First, import the azureml.core Python library ad display the version number.
47+
48+
```
49+
# Check core SDK version number
50+
import azureml.core
51+
52+
print("SDK version:", azureml.core.VERSION)
53+
```
54+
55+
### Initialize a workspace
56+
57+
The [Azure Machine Learning service workspace](concept-workspace.md) is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a [`workspace`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) object.
58+
59+
Create a workspace object from the `config.json` file created in the [prerequisites section](#prerequisites).
60+
61+
```Python
62+
ws = Workspace.from_config()
63+
```
64+
65+
### Create a project directory
66+
Create a directory that will contain all the necessary code from your local machine that you will need access to on the remote resource. This includes the training script and any additional files your training script depends on.
67+
68+
```
69+
import os
70+
71+
project_folder = './chainer-mnist'
72+
os.makedirs(project_folder, exist_ok=True)
73+
```
74+
75+
### Prepare training script
76+
77+
In this tutorial, the training script **chainer_mnist.py** is already provided for you. In practice, you should be able to take any custom training script as is and run it with Azure ML without having to modify your code.
78+
79+
To use Azure ML's tracking and metrics capabilities, you will have to add a small amount of Azure ML code inside your training script. The training script **chainer_mnist.py** shows how to log some metrics to your Azure ML run. To do so, you access the Azure ML `Run` object within the script.
80+
81+
Copy the training script **chainer_mnist.py** into your project directory.
82+
83+
```
84+
import shutil
85+
86+
shutil.copy('chainer_mnist.py', project_folder)
87+
```
88+
89+
### Create an experiment
90+
91+
Create an experiment and a folder to hold your training scripts. In this example, create an experiment called "chainer-mnist".
92+
93+
```
94+
from azureml.core import Experiment
95+
96+
experiment_name = 'chainer-mnist'
97+
experiment = Experiment(ws, name=experiment_name)
98+
```
99+
100+
101+
## Create or get a compute target
102+
103+
You will need a [compute target](concept-compute-target.md) for training your model. In this tutorial, you will use Azure ML managed compute (AmlCompute) for your remote training compute resource.
104+
105+
**Creation of AmlCompute takes approximately 5 minutes**. If the AmlCompute with that name is already in your workspace, this code will skip the creation process.
106+
107+
```Python
108+
from azureml.core.compute import ComputeTarget, AmlCompute
109+
from azureml.core.compute_target import ComputeTargetException
110+
111+
# choose a name for your cluster
112+
cluster_name = "gpu-cluster"
113+
114+
try:
115+
compute_target = ComputeTarget(workspace=ws, name=cluster_name)
116+
print('Found existing compute target.')
117+
except ComputeTargetException:
118+
print('Creating a new compute target...')
119+
compute_config = AmlCompute.provisioning_configuration(vm_size='STANDARD_NC6',
120+
max_nodes=4)
121+
122+
# create the cluster
123+
compute_target = ComputeTarget.create(ws, cluster_name, compute_config)
124+
125+
compute_target.wait_for_completion(show_output=True)
126+
127+
# use get_status() to get a detailed status for the current cluster.
128+
print(compute_target.get_status().serialize())
129+
```
130+
131+
For more information on compute targets, see the [what is a compute target](concept-compute-target.md) article.
132+
133+
## Create a Chainer estimator
134+
135+
The [Chainer estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.chainer?view=azure-ml-py) provides a simple way of launching Chainer training jobs on your compute target.
136+
137+
The Chainer estimator is implemented through the generic [`estimator`](https://docs.microsoft.com//python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class, which can be used to support any framework. For more information about training models using the generic estimator, see [train models with Azure Machine Learning using estimator](how-to-train-ml-models.md)
138+
139+
```Python
140+
from azureml.train.dnn import Chainer
141+
142+
script_params = {
143+
'--epochs': 10,
144+
'--batchsize': 128,
145+
'--output_dir': './outputs'
146+
}
147+
148+
estimator = Chainer(source_directory=project_folder,
149+
script_params=script_params,
150+
compute_target=compute_target,
151+
pip_packages=['numpy', 'pytest'],
152+
entry_script='chainer_mnist.py',
153+
use_gpu=True)
154+
```
155+
156+
## Submit a run
157+
158+
The [Run object](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run%28class%29?view=azure-ml-py) provides the interface to the run history while the job is running and after it has completed.
159+
160+
```Python
161+
run = exp.submit(est)
162+
run.wait_for_completion(show_output=True)
163+
```
164+
165+
As the Run is executed, it goes through the following stages:
166+
167+
- **Preparing**: A docker image is created according to the TensorFlow estimator. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress.
168+
169+
- **Scaling**: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.
170+
171+
- **Running**: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.
172+
173+
- **Post-Processing**: The ./outputs folder of the run is copied over to the run history.
174+
175+
176+
## Next steps
177+
178+
In this article, you trained a Chainer model on Azure Machine Learning service.
179+
180+
* To learn how to deploy a model, continue on to our [model deployment](how-to-deploy-and-where.md) article.
181+
182+
* [Tune hyperparameters](how-to-tune-hyperparameters.md)
183+
184+
* [Track run metrics during training](how-to-track-experiments.md)

articles/machine-learning/service/toc.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -182,6 +182,8 @@
182182
href: how-to-train-keras.md
183183
- name: Use PyTorch
184184
href: how-to-train-pytorch.md
185+
- name: Use Chainer
186+
href: how-to-train-chainer.md
185187
- name: Tune hyperparameters
186188
displayName: parameter
187189
href: how-to-tune-hyperparameters.md

0 commit comments

Comments
 (0)