Skip to content

Commit 1db4e81

Browse files
authored
Merge pull request #88799 from trevorbye/master
updating to use new file dataset class
2 parents 749256d + fa2bae2 commit 1db4e81

File tree

2 files changed

+35
-35
lines changed

2 files changed

+35
-35
lines changed

articles/machine-learning/service/how-to-train-chainer.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
---
22
title: Train deep learning neural network with Chainer
33
titleSuffix: Azure Machine Learning
4-
description: Learn how to run your PyTorch training scripts at enterprise scale using Azure Machine Learning's Chainer estimator class. The example script classifis handwritten digit images to build a deep learning neural network using the Chainer Python library running on top of numpy.
4+
description: Learn how to run your PyTorch training scripts at enterprise scale using Azure Machine Learning's Chainer estimator class. The example script classifies handwritten digit images to build a deep learning neural network using the Chainer Python library running on top of numpy.
55
services: machine-learning
66
ms.service: machine-learning
77
ms.subservice: core
@@ -81,7 +81,7 @@ In this tutorial, the training script **chainer_mnist.py** is already provided f
8181

8282
To use Azure ML's tracking and metrics capabilities, add a small amount of Azure ML code inside your training script. The training script **chainer_mnist.py** shows how to log some metrics to your Azure ML run using the `Run` object within the script.
8383

84-
The provided training script uses example data from the chainer `datasets.mnist.get_mnist` function. For your own data, you may need to use steps such as [Upload dataset and scripts](how-to-train-keras.md#upload-dataset-and-scripts) to make data available during training.
84+
The provided training script uses example data from the chainer `datasets.mnist.get_mnist` function. For your own data, you may need to use steps such as [Upload dataset and scripts](how-to-train-keras.md) to make data available during training.
8585

8686
Copy the training script **chainer_mnist.py** into your project directory.
8787

articles/machine-learning/service/how-to-train-keras.md

Lines changed: 33 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -52,13 +52,9 @@ First, import the necessary Python libraries.
5252

5353
```Python
5454
import os
55-
import urllib
56-
import shutil
5755
import azureml
58-
5956
from azureml.core import Experiment
6057
from azureml.core import Workspace, Run
61-
6258
from azureml.core.compute import ComputeTarget, AmlCompute
6359
from azureml.core.compute_target import ComputeTargetException
6460
```
@@ -75,43 +71,36 @@ ws = Workspace.from_config()
7571

7672
### Create an experiment
7773

78-
Create an experiment and a folder to hold your training scripts. In this example, create an experiment called "keras-mnist".
74+
Create an experiment called "keras-mnist" in your workspace.
7975

8076
```Python
81-
script_folder = './keras-mnist'
82-
os.makedirs(script_folder, exist_ok=True)
83-
8477
exp = Experiment(workspace=ws, name='keras-mnist')
8578
```
8679

87-
### Upload dataset and scripts
88-
89-
The [datastore](how-to-access-data.md) is a place where data can be stored and accessed by mounting or copying the data to the compute target. Each workspace provides a default datastore. Upload the data and training scripts to the datastore so that they can be easily accessed during training.
80+
### Create a file dataset
9081

91-
1. Download the MNIST dataset locally.
82+
A `FileDataset` object references one or multiple files in your workspace datastore or public urls. The files can be of any format, and the class provides you with the ability to download or mount the files to your compute. By creating a `FileDataset`, you create a reference to the data source location. If you applied any transformations to the data set, they will be stored in the data set as well. The data remains in its existing location, so no extra storage cost is incurred. See the [how-to](https://docs.microsoft.com/azure/machine-learning/service/how-to-create-register-datasets) guide on the `Dataset` package for more information.
9283

93-
```Python
94-
os.makedirs('./data/mnist', exist_ok=True)
84+
```python
85+
from azureml.core.dataset import Dataset
9586

96-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz', filename = './data/mnist/train-images.gz')
97-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz', filename = './data/mnist/train-labels.gz')
98-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz', filename = './data/mnist/test-images.gz')
99-
urllib.request.urlretrieve('http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz', filename = './data/mnist/test-labels.gz')
100-
```
101-
102-
1. Upload the MNIST dataset to the default datastore.
103-
104-
```Python
105-
ds = ws.get_default_datastore()
106-
ds.upload(src_dir='./data/mnist', target_path='mnist', overwrite=True, show_progress=True)
107-
```
87+
web_paths = [
88+
'http://yann.lecun.com/exdb/mnist/train-images-idx3-ubyte.gz',
89+
'http://yann.lecun.com/exdb/mnist/train-labels-idx1-ubyte.gz',
90+
'http://yann.lecun.com/exdb/mnist/t10k-images-idx3-ubyte.gz',
91+
'http://yann.lecun.com/exdb/mnist/t10k-labels-idx1-ubyte.gz'
92+
]
93+
dataset = Dataset.File.from_files(path=web_paths)
94+
```
10895

109-
1. Upload the Keras training script, `keras_mnist.py`, and the helper file, `utils.py`.
96+
Use the `register()` method to register the data set to your workspace so they can be shared with others, reused across various experiments, and referred to by name in your training script.
11097

111-
```Python
112-
shutil.copy('./keras_mnist.py', script_folder)
113-
shutil.copy('./utils.py', script_folder)
114-
```
98+
```python
99+
dataset = dataset.register(workspace=ws,
100+
name='mnist dataset',
101+
description='training and test dataset',
102+
create_new_version=True)
103+
```
115104

116105
## Create a compute target
117106

@@ -139,11 +128,22 @@ For more information on compute targets, see the [what is a compute target](conc
139128

140129
The [TensorFlow estimator](https://docs.microsoft.com/python/api/azureml-train-core/azureml.train.dnn.tensorflow?view=azure-ml-py) provides a simple way of launching TensorFlow training jobs on compute target. Since Keras runs on top of TensorFlow, you can use the TensorFlow estimator and import the Keras library using the `pip_packages` argument.
141130

131+
First get the data from the workspace datastore using the `Dataset` class.
132+
133+
```python
134+
dataset = Dataset.get_by_name(ws, 'mnist dataset')
135+
136+
# list the files referenced by mnist dataset
137+
dataset.to_path()
138+
```
139+
142140
The TensorFlow estimator is implemented through the generic [`estimator`](https://docs.microsoft.com//python/api/azureml-train-core/azureml.train.estimator.estimator?view=azure-ml-py) class, which can be used to support any framework. Additionally, create a dictionary `script_params` that contains the DNN hyperparameter settings. For more information about training models using the generic estimator, see [train models with Azure Machine Learning using estimator](how-to-train-ml-models.md)
143141

144-
```Python
142+
```python
143+
from azureml.train.dnn import TensorFlow
144+
145145
script_params = {
146-
'--data-folder': ds.path('mnist').as_mount(),
146+
'--data-folder': dataset.as_named_input('mnist').as_mount(),
147147
'--batch-size': 50,
148148
'--first-layer-neurons': 300,
149149
'--second-layer-neurons': 100,

0 commit comments

Comments
 (0)