@@ -73,10 +73,8 @@ Also, most of the captures are taken from ``Ranger21`` paper.
7373Adaptive Gradient Clipping (AGC)
7474~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
7575
76- | This idea originally proposed in ``NFNet (Normalized-Free Network)``
77- paper.
78- | AGC (Adaptive Gradient Clipping) clips gradients based on the
79- ``unit-wise ratio of gradient norms to parameter norms ``.
76+ | This idea originally proposed in ``NFNet (Normalized-Free Network)`` paper.
77+ | AGC (Adaptive Gradient Clipping) clips gradients based on the ``unit-wise ratio of gradient norms to parameter norms``.
8078
8179- code :
8280 `github <https://github.com/deepmind/deepmind-research/tree/master/nfnets >`__
@@ -99,8 +97,7 @@ centralizing the gradient to have zero mean.
9997Softplus Transformation
10098~~~~~~~~~~~~~~~~~~~~~~~
10199
102- By running the final variance denom through the softplus function, it
103- lifts extremely tiny values to keep them viable.
100+ By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.
104101
105102- paper : `arXiv <https://arxiv.org/abs/1908.00700 >`__
106103
@@ -123,8 +120,7 @@ Positive-Negative Momentum
123120| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png |
124121+--------------------------------------------------------------------------------------------------------------------+
125122
126- - code :
127- `github <https://github.com/zeke-xie/Positive-Negative-Momentum >`__
123+ - code : `github <https://github.com/zeke-xie/Positive-Negative-Momentum >`__
128124- paper : `arXiv <https://arxiv.org/abs/2103.17182 >`__
129125
130126Linear learning-rate warm-up
@@ -143,8 +139,7 @@ Stable weight decay
143139| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png |
144140+-------------------------------------------------------------------------------------------------------------+
145141
146- - code :
147- `github <https://github.com/zeke-xie/stable-weight-decay-regularization >`__
142+ - code : `github <https://github.com/zeke-xie/stable-weight-decay-regularization >`__
148143- paper : `arXiv <https://arxiv.org/abs/2011.11152 >`__
149144
150145Explore-exploit learning-rate schedule
@@ -154,18 +149,14 @@ Explore-exploit learning-rate schedule
154149| .. image:: https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png |
155150+---------------------------------------------------------------------------------------------------------------------+
156151
157-
158- - code :
159- `github <https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis >`__
152+ - code : `github <https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis >`__
160153- paper : `arXiv <https://arxiv.org/abs/2003.03977 >`__
161154
162155Lookahead
163156~~~~~~~~~
164157
165- | ``k`` steps forward, 1 step back. ``Lookahead`` consisting of keeping
166- an exponential moving average of the weights that is
167- | updated and substituted to the current weights every ``k_{lookahead}``
168- steps (5 by default).
158+ | ``k`` steps forward, 1 step back. ``Lookahead`` consisting of keeping an exponential moving average of the weights that is
159+ | updated and substituted to the current weights every ``k_{lookahead}`` steps (5 by default).
169160
170161- code : `github <https://github.com/alphadl/lookahead.pytorch >`__
171162- paper : `arXiv <https://arxiv.org/abs/1907.08610v2 >`__
@@ -180,10 +171,8 @@ Acceleration via Fractal Learning Rate Schedules
180171(Adaptive) Sharpness-Aware Minimization (A/SAM)
181172~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182173
183- | Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value
184- and loss sharpness.
185- | In particular, it seeks parameters that lie in neighborhoods having
186- uniformly low loss.
174+ | Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
175+ | In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
187176
188177- SAM paper : `paper <https://arxiv.org/abs/2010.01412 >`__
189178- ASAM paper : `paper <https://arxiv.org/abs/2102.11600 >`__
0 commit comments