Skip to content

Commit 49e885b

Browse files
committed
update
1 parent 15c3a8e commit 49e885b

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

doc/fluid/design/dist_train/async_update.md

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,10 @@
55
For the typical synchronous distributed training, some significant steps are as follows:
66

77
1. A Trainer will compute the gradients and SEND them to the Parameter Server(PServer) nodes.
8-
1. After the PServer node received gradients came from all the Trainers,
9-
it would apply the gradient to the respective variables, and using an optimize algorithms(SGD,
10-
Momentment...) to update the parameters.
8+
1. After the PServer node received gradients came from all the Trainers, It will aggregate the
9+
gradient variables for the same parameter into one gradient variable and then apply the aggregated
10+
gradient to the respective parameter, finally using an optimize algorithms(SGD, Monument...)
11+
to update the parameters.
1112
1. The Trainer would wait for the PServers finished the optimize stage, and GET the parameters from PServer,
1213
so all the Trainers would get the same parameters.
1314

@@ -38,7 +39,7 @@ mini-batch.
3839
### Trainer
3940

4041
- For the multiple devices distributed training, we need to aggregate the gradient
41-
variables which placed on different devices firstly, and then schedule a `SendVars` Operator to
42+
variables which placed on different devices firstly and then schedule a `SendVars` Operator to
4243
send the gradient variables to the multiple PServer instances.
4344
- Schedule `FetchVars` operator to fetch the latest parameter from PServer before running
4445
the forward ops.

0 commit comments

Comments
 (0)