Skip to content

Commit fe1c02c

Browse files
author
Trevor Bye
committed
updating to use new file dataset class
1 parent 42ac0e0 commit fe1c02c

File tree

1 file changed

+33
-33
lines changed

1 file changed

+33
-33
lines changed

articles/machine-learning/service/how-to-train-keras.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,9 @@ First, import the necessary Python libraries.
5252

5353
```Python
5454
import os
55-
import urllib
56-
import shutil
5755
import azureml
58-
5956
from azureml.core import Experiment
6057
from azureml.core import Workspace, Run
61-
6258
from azureml.core.compute import ComputeTarget, AmlCompute
6359
from azureml.core.compute_target import ComputeTargetException
6460
```
@@ -75,43 +71,36 @@ ws = Workspace.from_config()
7571

7672
### Create an experiment
7773

78-
Create an experiment and a folder to hold your training scripts. In this example, create an experiment called "keras-mnist".
74+
Create an experiment called "keras-mnist" in your workspace.
7975

8076
```Python
81-
script_folder = './keras-mnist'
82-
os.makedirs(script_folder, exist_ok=True)
83-
8477
exp = Experiment(workspace=ws, name='keras-mnist')
8578
```
8679

87-
### Upload dataset and scripts
88-
89-
The [datastore](how-to-access-data.md) is a place where data can be stored and accessed by mounting or copying the data to the compute target. Each workspace provides a default datastore. Upload the data and training scripts to the datastore so that they can be easily accessed during training.
80+
### Create a file dataset
9081

91-
1. Download the MNIST dataset locally.
82+
A `FileDataset` object references one or multiple files in your workspace datastore or public urls. The files can be of any format, and the class provides you with the ability to download or mount the files to your compute. By creating a `FileDataset`, you create a reference to the data source location. If you applied any transformations to the data set, they will be stored in the data set as well. The data remains in its existing location, so no extra storage cost is incurred. See the [how-to](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-create-register-datasets) guide on the `Dataset` package for more information.
9283

93-
```Python
94-
os.makedirs('./data/mnist', exist_ok=True)
84+
```python
85+
from azureml.core.dataset import Dataset
9586

96-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')
97-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')
98-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')
99-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')
100-
```
101-
102-
1. Upload the MNIST dataset to the default datastore.
103-
104-
```Python
105-
ds = ws.get_default_datastore()
106-
ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)
107-
```
87+
web_paths = [
88+
'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
89+
'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
90+
'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
91+
'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'
92+
]
93+
dataset = Dataset.File.from_files(path=web_paths)
94+
```
10895

109-
1. Upload the Keras training script, `keras_mnist.py`, and the helper file, `utils.py`.
96+
Use the `register()` method to register the data set to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.
11097

111-
```Python
112-
shutil.copy('./keras_mnist.py', script_folder)
113-
shutil.copy('./utils.py', script_folder)
114-
```
98+
```python
99+
dataset = dataset.register(workspace=ws,
100+
name='mnist dataset',
101+
description='training and test dataset',
102+
create_new_version=True)
103+
```
115104

116105
## Create a compute target
117106

@@ -139,11 +128,22 @@ For more information on compute targets, see the [what is a compute target](conc
139128

140129
The [TensorFlow estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) provides a simple way of launching TensorFlow training jobs on compute target. Since Keras runs on top of TensorFlow, you can use the TensorFlow estimator and import the Keras library using the `pip_packages` argument.
141130

131+
First get the data from the workspace datastore using the `Dataset` class.
132+
133+
```python
134+
dataset = Dataset.get_by_name(ws, 'mnist dataset')
135+
136+
# list the files referenced by mnist dataset
137+
dataset.to_path()
138+
```
139+
142140
The TensorFlow estimator is implemented through the generic [`estimator`](https://docs.microsoft.com//python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class, which can be used to support any framework. Additionally, create a dictionary `script_params` that contains the DNN hyperparameter settings. For more information about training models using the generic estimator, see [train models with Azure Machine Learning using estimator](how-to-train-ml-models.md)
143141

144-
```Python
142+
```python
143+
from azureml.train.dnn import TensorFlow
144+
145145
script_params = {
146-
'--data-folder': ds.path('mnist').as_mount(),
146+
'--data-folder': dataset.as_named_input('mnist').as_mount(),
147147
'--batch-size': 50,
148148
'--first-layer-neurons': 300,
149149
'--second-layer-neurons': 100,

0 commit comments

Comments
 (0)