Skip to content

Commit 9d5e181

Browse files
authored
Merge pull request #272 from kozistr/feature/ademamix-optimizer
[Feature] Implement AdEMAMix optimizer
2 parents 5a65b51 + 304c4ab commit 9d5e181

File tree

13 files changed

+233
-66
lines changed

13 files changed

+233
-66
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
1212
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13-
Currently, **75 optimizers (+ `bitsandbytes`, `qgalore`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13+
Currently, **76 optimizers (+ `bitsandbytes`, `qgalore`)**, **16 lr schedulers**, and **13 loss functions** are supported!
1414

1515
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
1616

@@ -173,6 +173,7 @@ supported_optimizers = get_supported_optimizers()
173173
| AdamMini | *Use Fewer Learning Rates To Gain More* | [github](https://github.com/zyushun/Adam-mini) | <https://arxiv.org/abs/2406.16793> | [cite](https://github.com/zyushun/Adam-mini?tab=readme-ov-file#citation) |
174174
| TRAC | *Adaptive Parameter-free Optimization* | [github](https://github.com/ComputationalRobotics/TRAC) | <https://arxiv.org/abs/2405.16642> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240516642M/exportcitation) |
175175
| AdamG | *Towards Stability of Parameter-free Optimization* | | <https://arxiv.org/abs/2405.04376> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240504376P/exportcitation) |
176+
| AdEMAMix | *Better, Faster, Older* | [github](https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch) | <https://arxiv.org/abs/2409.03137> | [cite](https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch?tab=readme-ov-file#reference) |
176177

177178
## Supported LR Scheduler
178179

docs/changelogs/v3.1.2.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,10 @@
11
## Change Log
22

3+
### Feature
4+
5+
* Implement `AdEMAMix` optimizer. (#272)
6+
* [THE ADEMAMIX OPTIMIZER: BETTER, FASTER, OLDER](https://arxiv.org/pdf/2409.03137)
7+
38
### Bug
49

510
* Add `**kwargs` to the parameters for dummy placeholder. (#270, #271)

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
1212
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13-
Currently, **75 optimizers (+ `bitsandbytes`, `qgalore`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13+
Currently, **76 optimizers (+ `bitsandbytes`, `qgalore`)**, **16 lr schedulers**, and **13 loss functions** are supported!
1414

1515
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
1616

@@ -173,6 +173,7 @@ supported_optimizers = get_supported_optimizers()
173173
| AdamMini | *Use Fewer Learning Rates To Gain More* | [github](https://github.com/zyushun/Adam-mini) | <https://arxiv.org/abs/2406.16793> | [cite](https://github.com/zyushun/Adam-mini?tab=readme-ov-file#citation) |
174174
| TRAC | *Adaptive Parameter-free Optimization* | [github](https://github.com/ComputationalRobotics/TRAC) | <https://arxiv.org/abs/2405.16642> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240516642M/exportcitation) |
175175
| AdamG | *Towards Stability of Parameter-free Optimization* | | <https://arxiv.org/abs/2405.04376> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv240504376P/exportcitation) |
176+
| AdEMAMix | *Better, Faster, Older* | [github](https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch) | <https://arxiv.org/abs/2409.03137> | [cite](https://github.com/nanowell/AdEMAMix-Optimizer-Pytorch?tab=readme-ov-file#reference) |
176177

177178
## Supported LR Scheduler
178179

docs/optimizer.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,6 +80,10 @@
8080
:docstring:
8181
:members:
8282

83+
::: pytorch_optimizer.AdEMAMix
84+
:docstring:
85+
:members:
86+
8387
::: pytorch_optimizer.agc
8488
:docstring:
8589
:members:

poetry.lock

Lines changed: 45 additions & 45 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "pytorch_optimizer"
3-
version = "3.1.1"
3+
version = "3.1.2"
44
description = "optimizer & lr scheduler & objective function collections in PyTorch"
55
license = "Apache-2.0"
66
authors = ["kozistr <[email protected]>"]
@@ -11,15 +11,15 @@ repository = "https://github.com/kozistr/pytorch_optimizer"
1111
documentation = "https://pytorch-optimizers.readthedocs.io/en/latest"
1212
keywords = [
1313
"pytorch", "deep-learning", "optimizer", "lr scheduler", "A2Grad", "ASGD", "AccSGD", "AdaBelief", "AdaBound",
14-
"AdaDelta", "AdaFactor", "AdaMax", "AdamG", "AdaMod", "AdaNorm", "AdaPNM", "AdaSmooth", "AdaHessian", "Adai",
15-
"Adalite", "AdaLomo", "AdamMini", "AdamP", "AdamS", "Adan", "AggMo", "Aida", "AliG", "Amos", "Apollo", "AvaGrad",
16-
"bSAM", "CAME", "DAdaptAdaGrad", "DAdaptAdam", "DAdaptAdan", "DAdaptSGD", "DAdaptLion", "DiffGrad", "FAdam",
17-
"Fromage", "GaLore", "Gravity", "GrokFast", "GSAM", "Kate", "Lamb", "LARS", "Lion", "LOMO", "Lookahead", "MADGRAD",
18-
"MSVAG", "Nero", "NovoGrad", "PAdam", "PCGrad", "PID", "PNM", "Prodigy", "QHAdam", "QHM", "RAdam", "Ranger",
19-
"Ranger21", "RotoGrad", "SAM", "ScheduleFreeSGD", "ScheduleFreeAdamW", "SGDP", "Shampoo", "ScalableShampoo", "SGDW",
20-
"SignSGD", "SM3", "SopihaH", "SRMM", "StableAdamW", "SWATS", "Tiger", "TRAC", "WSAM", "Yogi", "BCE", "BCEFocal",
21-
"Focal", "FocalCosine", "SoftF1", "Dice", "LDAM", "Jaccard", "Bi-Tempered", "Tversky", "FocalTversky",
22-
"LovaszHinge", "bitsandbytes", "WSD", "QGaLore",
14+
"AdaDelta", "AdaFactor", "AdaMax", "AdamG", "AdaMod", "AdaNorm", "AdaPNM", "AdaSmooth", "AdEMAMix", "AdaHessian",
15+
"Adai", "Adalite", "AdaLomo", "AdamMini", "AdamP", "AdamS", "Adan", "AggMo", "Aida", "AliG", "Amos", "Apollo",
16+
"AvaGrad", "bSAM", "CAME", "DAdaptAdaGrad", "DAdaptAdam", "DAdaptAdan", "DAdaptSGD", "DAdaptLion", "DiffGrad",
17+
"FAdam", "Fromage", "GaLore", "Gravity", "GrokFast", "GSAM", "Kate", "Lamb", "LARS", "Lion", "LOMO", "Lookahead",
18+
"MADGRAD", "MSVAG", "Nero", "NovoGrad", "PAdam", "PCGrad", "PID", "PNM", "Prodigy", "QHAdam", "QHM", "RAdam",
19+
"Ranger", "Ranger21", "RotoGrad", "SAM", "ScheduleFreeSGD", "ScheduleFreeAdamW", "SGDP", "Shampoo",
20+
"ScalableShampoo", "SGDW", "SignSGD", "SM3", "SopihaH", "SRMM", "StableAdamW", "SWATS", "Tiger", "TRAC", "WSAM",
21+
"Yogi", "BCE", "BCEFocal", "Focal", "FocalCosine", "SoftF1", "Dice", "LDAM", "Jaccard", "Bi-Tempered", "Tversky",
22+
"FocalTversky", "LovaszHinge", "bitsandbytes", "WSD", "QGaLore",
2323
]
2424
classifiers = [
2525
"License :: OSI Approved :: Apache Software License",

0 commit comments

Comments
 (0)