Skip to content

Commit 24b207a

Browse files
authored
Merge pull request #79119 from PeterCLu/plu-amls-keras
Add keras article
2 parents da4080e + 502d868 commit 24b207a

File tree

2 files changed

+185
-2
lines changed

2 files changed

+185
-2
lines changed
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
---
2+
title: Train and register Keras models running on TensorFlow
3+
titleSuffix: Azure Machine Learning service
4+
description: This article shows you how to train and register a Keras model running on TensorFlow using Azure Machine Learning service.
5+
services: machine-learning
6+
ms.service: machine-learning
7+
ms.subservice: core
8+
ms.topic: conceptual
9+
ms.author: minxia
10+
author: mx-iao
11+
ms.date: 06/07/2019
12+
ms.custom: seodec18
13+
---
14+
15+
# Train and register Keras models at scale with Azure Machine Learning service
16+
17+
This article shows you how to train and register a Keras model built on TensorFlow using Azure Machine Learning service. It uses the popular [MNIST dataset](http://yann.lecun.com/exdb/mnist/) to classify handwritten digits using a deep neural network (DNN) built using the [Keras Python library](https://keras.io) running on top of [TensorFlow](https://www.tensorflow.org/overview).
18+
19+
Keras is a high-level neural network API capable of running top of other popular DNN frameworks to simplify development. With Azure Machine Learning service, you can rapidly scale out training jobs using elastic cloud compute resources. You can also track your training runs, version models, deploy models, and much more.
20+
21+
Whether you're developing a Keras model from the ground-up or you're bringing an existing model into the cloud, Azure Machine Learning service can help you build production-ready models.
22+
23+
## Prerequisites
24+
25+
- An Azure subscription. Try the [free or paid version of Azure Machine Learning service](https://aka.ms/AMLFree) today.
26+
- [Install the Azure Machine Learning SDK for Python](setup-create-workspace.md#sdk)
27+
- [Create a workspace configuration file](setup-create-workspace.md#write-a-configuration-file)
28+
- [Download the sample script files](https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras) `mnist-keras.py` and `utils.py`
29+
30+
You can also find a completed [Jupyter Notebook version](https://github.com/Azure/MachineLearningNotebooks/blob/master/how-to-use-azureml/training-with-deep-learning/train-hyperparameter-tune-deploy-with-keras/train-hyperparameter-tune-deploy-with-keras.ipynb) of this guide on GitHub samples page. The notebook includes expanded sections covering intelligent hyperparameter tuning, model deployment, and notebook widgets.
31+
32+
## Set up the experiment
33+
34+
This section sets up the training experiment by loading the required python packages, initializing a workspace, creating an experiment, and uploading the training data and training scripts.
35+
36+
### Import packages
37+
38+
First, import the necessary Python libraries.
39+
40+
```Python
41+
import os
42+
import urllib
43+
import shutil
44+
import azureml
45+
46+
from azureml.core import Experiment
47+
from azureml.core import Workspace, Run
48+
49+
from azureml.core.compute import ComputeTarget, AmlCompute
50+
from azureml.core.compute_target import ComputeTargetException
51+
```
52+
53+
### Initialize a workspace
54+
55+
The [Azure Machine Learning service workspace](concept-workspace.md) is the top-level resource for the service. It provides you with a centralized place to work with all the artifacts you create. In the Python SDK, you can access the workspace artifacts by creating a [`workspace`](https://docs.microsoft.com/python/api/azureml-core/azureml.core.workspace.workspace?view=azure-ml-py) object.
56+
57+
Create a workspace object from the `config.json` file created in the [prerequisites section](#prerequisites).
58+
59+
```Python
60+
ws = Workspace.from_config()
61+
```
62+
63+
### Create an experiment
64+
65+
Create an experiment and a folder to hold your training scripts. In this example, create an experiment called "keras-mnist".
66+
67+
```Python
68+
script_folder = './keras-mnist'
69+
os.makedirs(script_folder, exist_ok=True)
70+
71+
exp = Experiment(workspace=ws, name='keras-mnist')
72+
```
73+
74+
### Upload dataset and scripts
75+
76+
The [datastore](how-to-access-data.md) is a place where data can be stored and accessed by mounting or copying the data to the compute target. Each workspace provides a default datastore. Upload the data and training scripts to the datastore so that they can be easily accessed during training.
77+
78+
1. Download the MNIST dataset locally.
79+
80+
```Python
81+
os.makedirs('./data/mnist', exist_ok=True)
82+
83+
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')
84+
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')
85+
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')
86+
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')
87+
```
88+
89+
1. Upload the MNIST dataset to the default datastore.
90+
91+
```Python
92+
ds = ws.get_default_datastore()
93+
ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)
94+
```
95+
96+
1. Upload the Keras training script, `keras_mnist.py`, and the helper file, `utils.py`.
97+
98+
```Python
99+
shutil.copy('./keras_mnist.py', script_folder)
100+
shutil.copy('./utils.py', script_folder)
101+
```
102+
103+
## Get the default compute target
104+
105+
Each workspace comes with two, default compute targets: a gpu-based compute target and a cpu-based compute target. The default compute targets have autoscale set to 0, which means they are not allocated until you use it. WIn this example, use the default GPU compute target.
106+
107+
```Python
108+
compute_target = ws.get_default_compute_target(type="GPU")
109+
```
110+
111+
For more information on compute targets, see the [what is a compute target](concept-compute-target.md) article.
112+
113+
## Create a TensorFlow estimator and import Keras
114+
115+
The [TensorFlow estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) provides a simple way of launching TensorFlow training jobs on compute target. Since Keras runs on top of TensorFlow, you can use the TensorFlow estimator and import the Keras library using the `pip_packages` argument.
116+
117+
The TensorFlow estimator is implemented through the generic [`estimator`](https://docs.microsoft.com//python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class, which can be used to support any framework. For more information about training models using the generic estimator, see [train models with Azure Machine Learning using estimator](how-to-train-ml-models.md)
118+
119+
```Python
120+
script_params = {
121+
'--data-folder': ds.path('mnist').as_mount(),
122+
'--batch-size': 50,
123+
'--first-layer-neurons': 300,
124+
'--second-layer-neurons': 100,
125+
'--learning-rate': 0.001
126+
}
127+
128+
est = TensorFlow(source_directory=script_folder,
129+
entry_script='keras_mnist.py',
130+
script_params=script_params,
131+
compute_target=compute_target,
132+
pip_packages=['keras', 'matplotlib'],
133+
use_gpu=True)
134+
```
135+
136+
## Submit a run
137+
138+
The [Run object](https://docs.microsoft.com/python/api/azureml-core/azureml.core.run%28class%29?view=azure-ml-py) provides the interface to the run history while the job is running and after it has completed.
139+
140+
```Python
141+
run = exp.submit(est)
142+
run.wait_for_completion(show_output=True)
143+
```
144+
145+
As the Run is executed, it goes through the following stages:
146+
147+
- **Preparing**: A docker image is created according to the TensorFlow estimator. The image is uploaded to the workspace's container registry and cached for later runs. Logs are also streamed to the run history and can be viewed to monitor progress.
148+
149+
- **Scaling**: The cluster attempts to scale up if the Batch AI cluster requires more nodes to execute the run than are currently available.
150+
151+
- **Running**: All scripts in the script folder are uploaded to the compute target, data stores are mounted or copied, and the entry_script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.
152+
153+
- **Post-Processing**: The ./outputs folder of the run is copied over to the run history.
154+
155+
## Register the model
156+
157+
Once you've trained the model, you can register it to your workspace. Model registration lets you store and version your models in your workspace to simplify [model management and deployment](concept-model-management-and-deployment.md).
158+
159+
```Python
160+
model = run.register_model(model_name='keras-dnn-mnist', model_path='outputs/model')
161+
```
162+
163+
You can also download a local copy of the model. This can be useful for doing additional model validation work locally. In the training script, `mnist-keras.py`, a TensorFlow saver object persists the model to a local folder (local to the compute target). You can use the Run object to download a copy from datastore.
164+
165+
```Python
166+
# Create a model folder in the current directory
167+
os.makedirs('./model', exist_ok=True)
168+
169+
for f in run.get_file_names():
170+
if f.startswith('outputs/model'):
171+
output_file_path = os.path.join('./model', f.split('/')[-1])
172+
print('Downloading from {} to {} ...'.format(f, output_file_path))
173+
run.download_file(name=f, output_file_path=output_file_path)
174+
```
175+
176+
## Next steps
177+
178+
In this article, you trained and registered a Keras model on Azure Machine Learning service. To learn how to deploy a model, continue on to our model deployment article.
179+
180+
> [!div class="nextstepaction"]
181+
> [How and where to deploy models](how-to-deploy-and-where.md)

articles/machine-learning/service/toc.yml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -177,10 +177,12 @@
177177
href: how-to-set-up-training-targets.md
178178
- name: Create estimators in training
179179
href: how-to-train-ml-models.md
180-
- name: Use PyTorch
181-
href: how-to-train-pytorch.md
182180
- name: Use TensorFlow
183181
href: how-to-train-tensorflow.md
182+
- name: Use Keras
183+
href: how-to-train-keras.md
184+
- name: Use PyTorch
185+
href: how-to-train-pytorch.md
184186
- name: Tune hyperparameters
185187
displayName: parameter
186188
href: how-to-tune-hyperparameters.md

0 commit comments

Comments
 (0)