update

Yancey0623 · Yancey0623 · commit 49e885b6eaf9 · 2018-04-16T19:43:36.000+08:00
diff --git a/doc/fluid/design/dist_train/async_update.md b/doc/fluid/design/dist_train/async_update.md
@@ -5,9 +5,10 @@
 For the typical synchronous distributed training, some significant steps are as follows:
 
 1. A Trainer will compute the gradients and SEND them to the Parameter Server(PServer) nodes.
-1. After the PServer node received gradients came from all the Trainers,
-it would apply the gradient to the respective variables, and using an optimize algorithms(SGD,
-Momentment...) to update the parameters.
+1. After the PServer node received gradients came from all the Trainers, It will aggregate the
+gradient variables for the same parameter into one gradient variable and then apply the aggregated
+gradient to the respective parameter, finally using an optimize algorithms(SGD, Monument...)
+to update the parameters.
 1. The Trainer would wait for the PServers finished the optimize stage, and GET the parameters from PServer,
 so all the Trainers would get the same parameters.
 
@@ -38,7 +39,7 @@ mini-batch.
 ### Trainer
 
 - For the multiple devices distributed training, we need to aggregate the gradient
-variables which placed on different devices firstly, and then schedule a `SendVars` Operator to
+variables which placed on different devices firstly and then schedule a `SendVars` Operator to
 send the gradient variables to the multiple PServer instances.
 - Schedule `FetchVars` operator to fetch the latest parameter from PServer before running
 the forward ops.