Skip to content

Commit b0146c2

Browse files
authored
[Feature] Implement AdaMuon optimizer (#395)
* feature: AdaMuon optimizer * update: test cases * build(deps): update dev-deps * docs: AdaMuon optimizer * docs: AdaMuon optimizer * docs: v3.6.2 changelog * fix: test_get_supported_optimizers * fix: recipe * fix: test_get_supported_optimizers
1 parent 77098e9 commit b0146c2

File tree

11 files changed

+316
-42
lines changed

11 files changed

+316
-42
lines changed

README.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
## The reasons why you use `pytorch-optimizer`.
1212

13-
* Wide range of supported optimizers. Currently, **108 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13+
* Wide range of supported optimizers. Currently, **109 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
1414
* Including many variants such as `ADOPT`, `Cautious`, `AdamD`, `StableAdamW`, and `Gradient Centrailiaztion`
1515
* Easy to use, clean, and tested codes
1616
* Active maintenance
@@ -215,7 +215,8 @@ get_supported_optimizers(['adam*', 'ranger*'])
215215
| RACS & Alice | *Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension* | | <https://arxiv.org/pdf/2502.07752> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250207752G/exportcitation) |
216216
| VSGD | *Variational Stochastic Gradient Descent for Deep Neural Networks* | [github](https://github.com/generativeai-tue/vsgd) | <https://openreview.net/forum?id=xu4ATNjcdy> | [cite](https://github.com/generativeai-tue/vsgd/tree/main?tab=readme-ov-file#cite) |
217217
| SNSM | *Subset-Norm and Subspace-Momentum: Faster Memory-Efficient Adaptive Optimization with Convergence Guarantees* | [github](https://github.com/timmytonga/sn-sm) | <https://arxiv.org/abs/2411.07120> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241107120N/exportcitation) |
218-
| AdamC | Why Gradients Rapidly Increase Near the End of Training* | | <https://arxiv.org/abs/2506.02285> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250602285D/exportcitation) |
218+
| AdamC | *Why Gradients Rapidly Increase Near the End of Training* | | <https://arxiv.org/abs/2506.02285> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250602285D/exportcitation) |
219+
| AdaMuon | *Adaptive Muon Optimizer* | | <https://arxiv.org/abs/2507.11005v1> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250711005S/exportcitation) |
219220

220221
## Supported LR Scheduler
221222

docs/changelogs/v3.6.2.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
## Change Log
2+
3+
### Feature
4+
5+
* Implement `AdaMuon` optimizer. (#394, #395)
6+
* [Adaptive Muon Optimizer](https://arxiv.org/abs/2507.11005v1)

docs/optimizer.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,10 @@
284284
:docstring:
285285
:members:
286286

287+
::: pytorch_optimizer.AdaMuon
288+
:docstring:
289+
:members:
290+
287291
::: pytorch_optimizer.Nero
288292
:docstring:
289293
:members:

poetry.lock

Lines changed: 24 additions & 24 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

pyproject.toml

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -11,14 +11,14 @@ repository = "https://github.com/kozistr/pytorch_optimizer"
1111
documentation = "https://pytorch-optimizers.readthedocs.io/en/latest"
1212
keywords = [
1313
"pytorch", "deep-learning", "optimizer", "lr scheduler", "A2Grad", "Alice", "ASGD", "AccSGD", "AdaBelief",
14-
"AdaBound", "AdaDelta", "AdaFactor", "AdaGC", "AdaMax", "AdamG", "AdaMod", "AdaNorm", "AdaPNM", "AdaSmooth",
15-
"AdEMAMix", "Simplified-AdEMAMix", "ADOPT", "AdaHessian", "Adai", "Adalite", "AdaLomo", "AdamMini", "AdamP",
16-
"AdamS", "Adan", "AggMo", "Aida", "AliG", "Amos", "Apollo", "APOLLO", "AvaGrad", "bSAM", "CAME", "DAdaptAdaGrad",
17-
"DAdaptAdam", "DAdaptAdan", "DAdaptSGD", "DAdaptLion", "DeMo", "DiffGrad", "EXAdam", "FAdam", "Fira", "FOCUS",
18-
"Fromage", "FTRL", "GaLore", "Grams", "Gravity", "GrokFast", "GSAM", "Kate", "Lamb", "LaProp", "LARS", "Lion",
19-
"LOMO", "Lookahead", "MADGRAD", "MARS", "MSVAG", "Muno", "Nero", "NovoGrad", "OrthoGrad", "PAdam", "PCGrad", "PID",
20-
"PNM", "Prodigy", "PSGD", "QHAdam", "QHM", "RACS", "RAdam", "Ranger", "Ranger21", "RotoGrad", "SAM", "GCSAM",
21-
"LookSAM", "ScheduleFreeSGD", "ScheduleFreeAdamW", "ScheduleFreeRAdam", "SCION", "SGDP", "Shampoo",
14+
"AdaBound", "AdaDelta", "AdaFactor", "AdaGC", "AdaMax", "AdaMuon", "AdamG", "AdaMod", "AdaNorm", "AdaPNM",
15+
"AdaSmooth", "AdEMAMix", "Simplified-AdEMAMix", "ADOPT", "AdaHessian", "Adai", "Adalite", "AdaLomo", "AdamMini",
16+
"AdamP", "AdamS", "Adan", "AggMo", "Aida", "AliG", "Amos", "Apollo", "APOLLO", "AvaGrad", "bSAM", "CAME",
17+
"DAdaptAdaGrad", "DAdaptAdam", "DAdaptAdan", "DAdaptSGD", "DAdaptLion", "DeMo", "DiffGrad", "EXAdam", "FAdam",
18+
"Fira", "FOCUS", "Fromage", "FTRL", "GaLore", "Grams", "Gravity", "GrokFast", "GSAM", "Kate", "Lamb", "LaProp",
19+
"LARS", "Lion", "LOMO", "Lookahead", "MADGRAD", "MARS", "MSVAG", "Muno", "Nero", "NovoGrad", "OrthoGrad", "PAdam",
20+
"PCGrad", "PID", "PNM", "Prodigy", "PSGD", "QHAdam", "QHM", "RACS", "RAdam", "Ranger", "Ranger21", "RotoGrad",
21+
"SAM", "GCSAM", "LookSAM", "ScheduleFreeSGD", "ScheduleFreeAdamW", "ScheduleFreeRAdam", "SCION", "SGDP", "Shampoo",
2222
"ScalableShampoo", "SGDW", "SignSGD", "SM3", "SOAP", "SopihaH", "SPAM", "StableSPAM", "SRMM", "StableAdamW",
2323
"SWATS", "TAM", "Tiger", "TRAC", "VSGD", "WSAM", "Yogi", "BCE", "BCEFocal", "Focal", "FocalCosine", "SoftF1",
2424
"Dice", "LDAM", "Jaccard", "Bi-Tempered", "Tversky", "FocalTversky", "LovaszHinge", "bitsandbytes", "WSD",

pytorch_optimizer/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@
8989
AdaMod,
9090
AdamP,
9191
AdamS,
92+
AdaMuon,
9293
AdamW,
9394
AdamWSN,
9495
Adan,

pytorch_optimizer/optimizer/__init__.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@
6666
from pytorch_optimizer.optimizer.madgrad import MADGRAD
6767
from pytorch_optimizer.optimizer.mars import MARS
6868
from pytorch_optimizer.optimizer.msvag import MSVAG
69-
from pytorch_optimizer.optimizer.muon import Muon
69+
from pytorch_optimizer.optimizer.muon import AdaMuon, Muon
7070
from pytorch_optimizer.optimizer.nero import Nero
7171
from pytorch_optimizer.optimizer.novograd import NovoGrad
7272
from pytorch_optimizer.optimizer.orthograd import OrthoGrad
@@ -322,6 +322,7 @@ def load_optimizer(optimizer: str) -> OPTIMIZER:
322322
RACS,
323323
Alice,
324324
VSGD,
325+
AdaMuon,
325326
]
326327
OPTIMIZERS: Dict[str, OPTIMIZER] = {str(optimizer.__name__).lower(): optimizer for optimizer in OPTIMIZER_LIST}
327328

0 commit comments

Comments
 (0)