You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
| Optimizer | Description | Official Code | Paper |
16
16
| :---: | :---: | :---: | :---: |
17
17
| AdamP |*Slowing Down the Slowdown for Momentum Optimizers on Scale-invariant Weights*|[github](https://github.com/clovaai/AdamP)|[https://arxiv.org/abs/2006.08217](https://arxiv.org/abs/2006.08217)|
| Chebyshev LR Schedules |*Acceleration via Fractal Learning Rate Schedules*|[~~github~~]()|[https://arxiv.org/abs/2103.01338v1](https://arxiv.org/abs/2103.01338v1)|
20
-
| Gradient Centralization (GC) |*A New Optimization Technique for Deep Neural Networks*|[github](https://github.com/Yonghongwei/Gradient-Centralization)|[https://arxiv.org/abs/2004.01461](https://arxiv.org/abs/2004.01461)|
| RAdam |*On the Variance of the Adaptive Learning Rate and Beyond*|[github](https://github.com/LiyuanLucasLiu/RAdam)|[https://arxiv.org/abs/1908.03265](https://arxiv.org/abs/1908.03265)|
23
19
| Ranger |*a synergistic optimizer combining RAdam and LookAhead, and now GC in one optimizer*|[github](https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer)||
24
-
| Ranger21 |*integrating the latest deep learning components into a single optimizer*|[github](https://github.com/lessw2020/Ranger21)|||
20
+
| Ranger21 |*a synergistic deep learning optimizer*|[github](https://github.com/lessw2020/Ranger21)|[https://arxiv.org/abs/2106.13731](https://arxiv.org/abs/2106.13731)|
21
+
22
+
## Useful Resources
23
+
24
+
Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in `Ranger21` optimizer.
25
+
26
+
Also, most of the captures are taken from `Ranger21` paper.
27
+
28
+
### Adaptive Gradient Clipping (AGC)
29
+
30
+
This idea originally proposed in `NFNet (Normalized-Free Network)` paper.
31
+
AGC (Adaptive Gradient Clipping) clips gradients based on the `unit-wise ratio of gradient norms to parameter norms`.
0 commit comments