Skip to content

Commit 32ccbb5

Browse files
authored
Fix gradient not averaged when parallel training. (#1104)
* Fix gradient not averaged when parallel training. * Correct throughput metrics and explain CPU runtime in the parallel-training tutorial.
1 parent a5bdd14 commit 32ccbb5

File tree

2 files changed

+14
-8
lines changed

2 files changed

+14
-8
lines changed

deepmd/train/trainer.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -384,10 +384,10 @@ def _build_training(self):
384384
optimizer = self.run_opt._HVD.DistributedOptimizer(optimizer)
385385
else:
386386
optimizer = tf.train.AdamOptimizer(learning_rate = self.learning_rate)
387-
grads = tf.gradients(self.l2_l, trainable_variables)
388-
apply_op = optimizer.apply_gradients (zip (grads, trainable_variables),
389-
global_step=self.global_step,
390-
name='train_step')
387+
apply_op = optimizer.minimize(loss=self.l2_l,
388+
global_step=self.global_step,
389+
var_list=trainable_variables,
390+
name='train_step')
391391
train_ops = [apply_op] + self._extra_train_ops
392392
self.train_op = tf.group(*train_ops)
393393
log.info("built training")

doc/train/parallel-training.md

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,19 @@ Currently, parallel training is enabled in a sychoronized way with help of [Horo
55
Testing `examples/water/se_e2_a` on a 8-GPU host, linear acceleration can be observed with increasing number of cards.
66
| Num of GPU cards | Seconds every 100 samples | Samples per second | Speed up |
77
| -- | -- | -- | -- |
8-
| 1 | 1.6116 | 62.05 | 1.00 |
9-
| 2 | 1.6310 | 61.31 | 1.98 |
10-
| 4 | 1.6168 | 61.85 | 3.99 |
11-
| 8 | 1.6212 | 61.68 | 7.95 |
8+
| 1 | 1.4515 | 68.89 | 1.00 |
9+
| 2 | 1.5962 | 62.65*2 | 1.82 |
10+
| 4 | 1.7635 | 56.71*4 | 3.29 |
11+
| 8 | 1.7267 | 57.91*8 | 6.72 |
1212

1313
To experience this powerful feature, please intall Horovod and [mpi4py](https://github.com/mpi4py/mpi4py) first. For better performance on GPU, please follow tuning steps in [Horovod on GPU](https://github.com/horovod/horovod/blob/master/docs/gpus.rst).
1414
```bash
15+
# With GPU, prefer NCCL as communicator.
16+
HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 HOROVOD_GPU_OPERATIONS=NCCL HOROVOD_NCCL_HOME=/path/to/nccl pip3 install horovod mpi4py
17+
```
18+
19+
If your work in CPU environment, please prepare runtime as below:
20+
```bash
1521
# By default, MPI is used as communicator.
1622
HOROVOD_WITHOUT_GLOO=1 HOROVOD_WITH_TENSORFLOW=1 pip install horovod mpi4py
1723
```

0 commit comments

Comments
 (0)