@@ -264,7 +264,19 @@ If you want to continue to train the model, simply re-run the above command beca
264264
265265Just add ``` --strategy=gpus ```
266266
267- ## 10. Training EfficientDets on TPUs.
267+ ## 10. Train on multi node GPUs.
268+ Following scripts will start a training task with 2 nodes.
269+
270+ Start Chief training node.
271+ ```
272+ python -m tf2.train --strategy=multi-gpus --worker=server_address1:12345,server_address2:23456 --worker_index=0 --mode=train --train_file_pattern=tfrecord/pascal*.tfrecord --model_name=efficientdet-d0 --model_dir=/tmp/efficientdet-d0 --batch_size=64 --num_examples_per_epoch=5717 --num_epochs=50 --hparams=voc_config.yaml
273+ ```
274+ Start the other training node.
275+ ```
276+ python -m tf2.train --strategy=multi-gpus --worker=server_address1:12345,server_address2:23456 --worker_index=1 --mode=train --train_file_pattern=tfrecord/pascal*.tfrecord --model_name=efficientdet-d0 --model_dir=/tmp/efficientdet-d0_1 --batch_size=64 --num_examples_per_epoch=5717 --num_epochs=50 --hparams=voc_config.yaml
277+ ```
278+
279+ ## 11. Training EfficientDets on TPUs.
268280
269281To train this model on Cloud TPU, you will need:
270282
@@ -286,7 +298,7 @@ For more instructions about training on TPUs, please refer to the following tuto
286298
287299 * EfficientNet tutorial: https://cloud.google.com/tpu/docs/tutorials/efficientnet
288300
289- ## 11 . Reducing Memory Usage when Training EfficientDets on GPU.
301+ ## 12 . Reducing Memory Usage when Training EfficientDets on GPU.
290302
291303EfficientDets use a lot of GPU memory for a few reasons:
292304
@@ -306,7 +318,7 @@ If set to True, keras model uses ```tf.recompute_grad``` to achieve gradient che
306318Testing shows that:
307319* It allows to train a d7x network with batch size of 2 on a 11Gb (1080Ti) GPU
308320
309- ## 12 . Visualize TF-Records.
321+ ## 13 . Visualize TF-Records.
310322
311323You can visualize tf-records with following commands:
312324
@@ -331,7 +343,7 @@ python dataset/inspect_tfrecords.py --file_pattern dataset/sample.record\
331343* save_samples_dir: save dir.
332344* eval: flag for eval data.
333345
334- ## 13 . Export to ONNX
346+ ## 14 . Export to ONNX
335347(1) Install tf2onnx
336348```
337349pip install tf2onnx
@@ -352,7 +364,7 @@ nms_configs:
352364python -m tf2onnx.convert --saved-model=<saved model directory> --output=<onnx filename> --opset=11
353365```
354366
355- ## 14 . Debug
367+ ## 15 . Debug
356368Just add ``` --debug ``` after command, then you could use pdb debug the model with eager execution and deterministic operations.
357369
358370NOTE: this is not an official Google product.
0 commit comments