Skip to content

Commit df339c0

Browse files
authored
Fix mistakes in dgc document. (#16731)
1 parent b07584d commit df339c0

File tree

1 file changed

+7
-7
lines changed

1 file changed

+7
-7
lines changed

python/paddle/fluid/optimizer.py

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -628,16 +628,16 @@ class DGCMomentumOptimizer(MomentumOptimizer):
628628
629629
Original paper is https://arxiv.org/abs/1712.01887
630630
631-
DGC reduce the communication bandwidth by sending only the important gradients (sparse update):\
631+
DGC reduces the communication bandwidth by sending only the important gradients (sparse update):\
632632
only gradients larger than a threshold are transmitted.
633633
634-
To avoid losing information, DGC accumulate the rest of the gradients locally.
634+
To avoid losing information, DGC accumulates the rest of the gradients locally.
635635
636636
Eventually, these gradients become large enough to be transmitted.
637637
638-
Thus, DGC send the large gradients immediately but eventually send all of the gradients over time.
638+
Thus, DGC sends the large gradients immediately but eventually send all of the gradients over time.
639639
640-
To ensure no loss of accuracy, DGC employs momentum correc-tionandlocal gradient clipping on top of the gradient sparsification to maintain model performance.
640+
To ensure no loss of accuracy, DGC employs momentum correction and local gradient clipping on top of the gradient sparsification to maintain model performance.
641641
642642
DGC also uses momentum factor masking and warmup training to overcome the staleness problem caused by reduced communication.
643643
@@ -652,17 +652,17 @@ class DGCMomentumOptimizer(MomentumOptimizer):
652652
learning_rate (float|Variable): the learning rate used to update parameters. \
653653
Can be a float value or a Variable with one float value as data element.
654654
momentum (float): Momentum factor.
655-
rampup_begin_step (int): The begining step from which gradient compression is implemented.
655+
rampup_begin_step (int): The beginning step from which gradient compression is implemented.
656656
rampup_step (int): How long it use the sparsity periods. Default is 1.
657657
for example: If the sparsity is [0.75, 0.9375, 0.984375, 0.996, 0.999], and the rampup_step is 5, \
658658
it will use 0.75 at 0 step, and 0.9375 at 1 step, and so on. And when reach sparsity array ends, \
659659
it will use 0.999 then and after.
660660
sparsity (list[float]): Get top important element from gradient tensor, the ratio is (1 - current sparsity).
661661
use_nesterov (bool): Enables Nesterov momentum. True means use nesterov.
662662
local_grad_clip_norm (float): Clip norm value if needed.
663-
num_trainers: The number of training node.
663+
num_trainers: The number of training nodes.
664664
regularization: A Regularizer, such as fluid.regularizer.L2DecayRegularizer.
665-
name: A optional name prefix.
665+
name: An optional name prefix.
666666
667667
Examples:
668668
.. code-block:: python

0 commit comments

Comments
 (0)