File tree Expand file tree Collapse file tree 1 file changed +5
-4
lines changed
doc/fluid/design/dist_train Expand file tree Collapse file tree 1 file changed +5
-4
lines changed Original file line number Diff line number Diff line change 5
5
For the typical synchronous distributed training, some significant steps are as follows:
6
6
7
7
1 . A Trainer will compute the gradients and SEND them to the Parameter Server(PServer) nodes.
8
- 1 . After the PServer node received gradients came from all the Trainers,
9
- it would apply the gradient to the respective variables, and using an optimize algorithms(SGD,
10
- Momentment...) to update the parameters.
8
+ 1 . After the PServer node received gradients came from all the Trainers, It will aggregate the
9
+ gradient variables for the same parameter into one gradient variable and then apply the aggregated
10
+ gradient to the respective parameter, finally using an optimize algorithms(SGD, Monument...)
11
+ to update the parameters.
11
12
1 . The Trainer would wait for the PServers finished the optimize stage, and GET the parameters from PServer,
12
13
so all the Trainers would get the same parameters.
13
14
@@ -38,7 +39,7 @@ mini-batch.
38
39
### Trainer
39
40
40
41
- For the multiple devices distributed training, we need to aggregate the gradient
41
- variables which placed on different devices firstly, and then schedule a ` SendVars ` Operator to
42
+ variables which placed on different devices firstly and then schedule a ` SendVars ` Operator to
42
43
send the gradient variables to the multiple PServer instances.
43
44
- Schedule ` FetchVars ` operator to fetch the latest parameter from PServer before running
44
45
the forward ops.
You can’t perform that action at this time.
0 commit comments