You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
11
11
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
12
12
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13
-
Currently, **71 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13
+
Currently, **72 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
14
15
15
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
| Grokfast |*Accelerated Grokking by Amplifying Slow Gradients*|[github](https://github.com/ironjr/grokfast)|<https://arxiv.org/abs/2405.20233>|[cite](https://github.com/ironjr/grokfast?tab=readme-ov-file#citation)|
169
169
| Kate |*Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad*|[github](https://github.com/nazya/KATE)|<https://arxiv.org/abs/2403.02648>|[cite](https://github.com/nazya/KATE?tab=readme-ov-file#remove-that-square-root-a-new-efficient-scale-invariant-version-of-adagrad)|
170
170
| StableAdamW |*Stable and low-precision training for large-scale vision-language models*||<https://arxiv.org/abs/2304.13013>|[cite](https://ui.adsabs.harvard.edu/abs/2023arXiv230413013W/exportcitation)|
171
+
| AdamMini |*Use Fewer Learning Rates To Gain More*|[github](https://github.com/zyushun/Adam-mini)|<https://arxiv.org/abs/2406.16793>|[cite](https://github.com/zyushun/Adam-mini?tab=readme-ov-file#citation)|
Copy file name to clipboardExpand all lines: docs/index.md
+2-1Lines changed: 2 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@
10
10
11
11
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
12
12
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13
-
Currently, **71 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13
+
Currently, **72 optimizers (+ `bitsandbytes`)**, **16 lr schedulers**, and **13 loss functions** are supported!
14
14
15
15
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
| Grokfast |*Accelerated Grokking by Amplifying Slow Gradients*|[github](https://github.com/ironjr/grokfast)|<https://arxiv.org/abs/2405.20233>|[cite](https://github.com/ironjr/grokfast?tab=readme-ov-file#citation)|
169
169
| Kate |*Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad*|[github](https://github.com/nazya/KATE)|<https://arxiv.org/abs/2403.02648>|[cite](https://github.com/nazya/KATE?tab=readme-ov-file#remove-that-square-root-a-new-efficient-scale-invariant-version-of-adagrad)|
170
170
| StableAdamW |*Stable and low-precision training for large-scale vision-language models*||<https://arxiv.org/abs/2304.13013>|[cite](https://ui.adsabs.harvard.edu/abs/2023arXiv230413013W/exportcitation)|
171
+
| AdamMini |*Use Fewer Learning Rates To Gain More*|[github](https://github.com/zyushun/Adam-mini)|<https://arxiv.org/abs/2406.16793>|[cite](https://github.com/zyushun/Adam-mini?tab=readme-ov-file#citation)|
0 commit comments