You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
11
11
## The reasons why you use `pytorch-optimizer`.
12
12
13
-
* Wide range of supported optimizers. Currently, **87 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13
+
* Wide range of supported optimizers. Currently, **89 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
14
* Including many variants such as `Cautious`, `AdamD`, `Gradient Centrailiaztion`
| APOLLO |*SGD-like Memory, AdamW-level Performance*|[github](https://github.com/zhuhanqing/APOLLO)|<https://arxiv.org/abs/2412.05270>|[cite](https://github.com/zhuhanqing/APOLLO?tab=readme-ov-file#-citation)|
196
196
| MARS |*Unleashing the Power of Variance Reduction for Training Large Models*|[github](https://github.com/AGI-Arena/MARS)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AGI-Arena/MARS/tree/main?tab=readme-ov-file#citation)|
197
197
| SGDSaI |*No More Adam: Learning Rate Scaling at Initialization is All You Need*|[github](https://github.com/AnonymousAlethiometer/SGD_SaI)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AnonymousAlethiometer/SGD_SaI?tab=readme-ov-file#citation)|
198
-
| Grams |*Grams: Gradient Descent with Adaptive Momentum Scaling*||<https://arxiv.org/abs/2412.17107>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241217107C/exportcitation)|
198
+
| Grams |*Gradient Descent with Adaptive Momentum Scaling*||<https://arxiv.org/abs/2412.17107>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241217107C/exportcitation)|
199
+
| OrthoGrad |*Grokking at the Edge of Numerical Stability*|[github](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)|<https://arxiv.org/abs/2501.04697>|[cite](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability?tab=readme-ov-file#citation)|
200
+
| Adam-ATAN2 |*Scaling Exponents Across Parameterizations and Optimizers*||<https://arxiv.org/abs/2407.05872>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240705872E/exportcitation)|
199
201
200
202
## Supported LR Scheduler
201
203
@@ -371,6 +373,10 @@ Correcting the norm of a gradient in each iteration based on the adaptive traini
371
373
372
374
Updates only occur when the proposed update direction aligns with the current gradient.
373
375
376
+
### Adam-ATAN2
377
+
378
+
Adam-atan2 is a new numerically stable, scale-invariant version of Adam that eliminates the epsilon hyperparameter.
Copy file name to clipboardExpand all lines: docs/index.md
+8-2Lines changed: 8 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
11
11
## The reasons why you use `pytorch-optimizer`.
12
12
13
-
* Wide range of supported optimizers. Currently, **87 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13
+
* Wide range of supported optimizers. Currently, **89 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
14
* Including many variants such as `Cautious`, `AdamD`, `Gradient Centrailiaztion`
| APOLLO |*SGD-like Memory, AdamW-level Performance*|[github](https://github.com/zhuhanqing/APOLLO)|<https://arxiv.org/abs/2412.05270>|[cite](https://github.com/zhuhanqing/APOLLO?tab=readme-ov-file#-citation)|
196
196
| MARS |*Unleashing the Power of Variance Reduction for Training Large Models*|[github](https://github.com/AGI-Arena/MARS)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AGI-Arena/MARS/tree/main?tab=readme-ov-file#citation)|
197
197
| SGDSaI |*No More Adam: Learning Rate Scaling at Initialization is All You Need*|[github](https://github.com/AnonymousAlethiometer/SGD_SaI)|<https://arxiv.org/abs/2411.10438>|[cite](https://github.com/AnonymousAlethiometer/SGD_SaI?tab=readme-ov-file#citation)|
198
-
| Grams |*Grams: Gradient Descent with Adaptive Momentum Scaling*||<https://arxiv.org/abs/2412.17107>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241217107C/exportcitation)|
198
+
| Grams |*Gradient Descent with Adaptive Momentum Scaling*||<https://arxiv.org/abs/2412.17107>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241217107C/exportcitation)|
199
+
| OrthoGrad |*Grokking at the Edge of Numerical Stability*|[github](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability)|<https://arxiv.org/abs/2501.04697>|[cite](https://github.com/LucasPrietoAl/grokking-at-the-edge-of-numerical-stability?tab=readme-ov-file#citation)|
200
+
| Adam-ATAN2 |*Scaling Exponents Across Parameterizations and Optimizers*||<https://arxiv.org/abs/2407.05872>|[cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240705872E/exportcitation)|
199
201
200
202
## Supported LR Scheduler
201
203
@@ -371,6 +373,10 @@ Correcting the norm of a gradient in each iteration based on the adaptive traini
371
373
372
374
Updates only occur when the proposed update direction aligns with the current gradient.
373
375
376
+
### Adam-ATAN2
377
+
378
+
Adam-atan2 is a new numerically stable, scale-invariant version of Adam that eliminates the epsilon hyperparameter.
0 commit comments