Skip to content

Commit c1fc802

Browse files
authored
Polish the tutorial of model contribution (#2122)
* Modify the tile of building models with structured data using ElasticDL * Polish the tutorial of model contribution * Add a newline at the end * Fix by comments
1 parent 2e0f274 commit c1fc802

File tree

1 file changed

+11
-137
lines changed

1 file changed

+11
-137
lines changed

docs/tutorials/model_building.md renamed to docs/tutorials/model_contribution.md

Lines changed: 11 additions & 137 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# ElasticDL Model Building
1+
# ElasticDL Model Contribution
22

33
To submit an ElasticDL job, a user needs to provide a model file, such as
44
[`mnist_functional_api.py`](https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/mnist_functional_api/mnist_functional_api.py)
@@ -71,7 +71,7 @@ model = MnistModel()
7171
### dataset_fn
7272

7373
```python
74-
dataset_fn(dataset, training)
74+
dataset_fn(dataset, mode)
7575
```
7676

7777
`dataset_fn` is a function that takes a RecordIO `dataset` as input,
@@ -128,23 +128,23 @@ def dataset_fn(dataset, mode):
128128
### loss
129129

130130
```python
131-
loss(labels, output)
131+
loss(labels, predictions)
132132
```
133133

134134
`loss` is the loss function used in ElasticDL training.
135135

136136
Arguments:
137137

138138
- labels: `labels` from [`dataset_fn`](#dataset_fn).
139-
- output: [model](#model)'s output.
139+
- predictions: [model](#model)'s output.
140140

141141
Example:
142142

143143
```python
144-
def loss(labels, output):
144+
def loss(labels, predictions):
145145
return tf.reduce_mean(
146146
input_tensor=tf.nn.sparse_softmax_cross_entropy_with_logits(
147-
logits=output, labels=labels.flatten()
147+
logits=predictions, labels=labels.flatten()
148148
)
149149
)
150150
```
@@ -179,59 +179,15 @@ TensorFlow API.
179179
Example:
180180

181181
```python
182-
def eval_metrics_fn(predictions, labels):
182+
def eval_metrics_fn():
183183
return {
184-
"accuracy": tf.reduce_mean(
185-
input_tensor=tf.cast(
186-
tf.equal(
187-
tf.argmax(input=predictions, axis=1), labels.flatten()
188-
),
189-
tf.float32,
190-
)
184+
"accuracy": lambda labels, predictions: tf.equal(
185+
tf.argmax(predictions, 1, output_type=tf.int32),
186+
tf.cast(tf.reshape(labels, [-1]), tf.int32),
191187
)
192188
}
193189
```
194190

195-
### prepare_data_for_a_single_file
196-
197-
```python
198-
prepare_data_for_a_single_file(filename)
199-
```
200-
201-
`prepare_data_for_a_single_file` is to read a single file and do whatever
202-
user-defined logic to prepare the data (e.g, IO from the user's file system,
203-
feature engineering), and return the serialized data. The function can be used
204-
to process data for training, evaluation and prediction. The only difference
205-
between prediction data with training/evaluation data is that the 'label' in
206-
prediction data should be empty. Users should be able to determine if the data
207-
file contains label (e.g, via the different formats of filename) and implement
208-
the logic to prepare the data accordingly.
209-
210-
Example:
211-
212-
```python
213-
def prepare_data_for_a_single_file(filename):
214-
'''
215-
An image classification dataset that images belonging to the same category
216-
located in the same directory.
217-
'''
218-
label = int(filename.split('/')[-2])
219-
image = PIL.Image.open(filename)
220-
numpy_image = np.array(image)
221-
example_dict = {
222-
"image": tf.train.Feature(
223-
float_list=tf.train.FloatList(value=numpy_image.flatten())
224-
),
225-
"label": tf.train.Feature(
226-
int64_list=tf.train.Int64List(value=[label])
227-
),
228-
}
229-
example = tf.train.Example(
230-
features=tf.train.Features(feature=example_dict)
231-
)
232-
return example.SerializeToString()
233-
```
234-
235191
## Model Building Examples
236192

237193
- [MNIST model using Keras functional API](https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/mnist_functional_api/mnist_functional_api.py)
@@ -242,86 +198,4 @@ def prepare_data_for_a_single_file(filename):
242198

243199
- [CIFAR10 model using Keras modelsubclassing](https://github.com/sql-machine-learning/elasticdl/blob/develop/model_zoo/cifar10_subclass/cifar10_subclass.py)
244200

245-
## Run and Debug Locally in VS Code
246-
247-
It is more convenient to locally run and debug the defined model than
248-
submitting a job with the model to k8s cluster. The following example shows how
249-
to run and debug
250-
the DNN model using iris dataset.
251-
252-
### Locally Run
253-
254-
The command to locally run the DNN model using iris dataset saved in a CSV file.
255-
256-
```shell
257-
python -m elasticdl.python.elasticdl.client train \
258-
--model_zoo=/{REPO_DIR}/elasticdl/model_zoo \
259-
--model_def=odps_iris_dnn_model.odps_iris_dnn_model.custom_model \
260-
--training_data=/{DATA_DIR}/iris.csv \
261-
--validation_data=/{DATA_DIR}/iris.csv \
262-
--data_reader_params="columns=['sepal.length', 'sepal.width', \
263-
'petal.length', 'petal.width', 'variety']; sep=','" \
264-
--num_epochs=2 \
265-
--minibatch_size=64 \
266-
--num_minibatches_per_task=20 \
267-
--distribution_strategy=Local \
268-
--job_name=test-odps-iris \
269-
--evaluation_steps=20 \
270-
--output=iris_dnn_model
271-
```
272-
273-
### Debug Model in VS Code
274-
275-
We can add the command to the configurations in the `launch.json` file to debug
276-
the model in VS Code. The
277-
[tutorial](https://code.visualstudio.com/docs/python/debugging) show how to
278-
configure the `launch.json` file. For example, the configuration to debug the
279-
DNN model is
280-
281-
```json
282-
{
283-
// Use IntelliSense to learn about possible attributes.
284-
// Hover to view descriptions of existing attributes.
285-
// For more information, visit: https://go.microsoft.com/fwlink/?linkid=830387
286-
"version": "0.2.0",
287-
"configurations": [
288-
{
289-
"name": "Python: Current File",
290-
"type": "python",
291-
"request": "launch",
292-
"program": "${file}",
293-
"console": "integratedTerminal",
294-
"module": "elasticdl.python.elasticdl.client",
295-
"args": ["train",
296-
"--model_zoo",
297-
"/{REPO_DIR}/elasticdl/model_zoo",
298-
"--model_def",
299-
"odps_iris_dnn_model.odps_iris_dnn_model.custom_model",
300-
"--training_data",
301-
"/{DATA_DIR}/iris.csv",
302-
"--num_epochs",
303-
"2",
304-
"--minibatch_size",
305-
"64",
306-
"--num_minibatches_per_task",
307-
"20",
308-
"--distribution_strategy",
309-
"Local",
310-
"--num_workers",
311-
"2",
312-
"--checkpoint_steps",
313-
"10",
314-
"--evaluation_steps",
315-
"20",
316-
"--job_name",
317-
"test-odps-iris",
318-
"--data_reader_params",
319-
"columns=['sepal.length',
320-
'sepal.width',
321-
'petal.length',
322-
'petal.width',
323-
'variety']; sep=','"
324-
]
325-
}
326-
]
327-
}
201+
- [Preprocess structured data for Keras model](https://github.com/sql-machine-learning/elasticdl/blob/develop/docs/tutorials/preprocessing_tutorial.md)

0 commit comments

Comments
 (0)