Skip to content

Commit 22abf5a

Browse files
authored
[Feature] Implement SPlus optimizer (#399)
* docs: SPlus optimizer * feature: SPlus optimizer * docs: v3.6.2 changelog * update: SPlus optimizer * update: test cases * build(deps): update dev deps * chore: keyword * docs: SPlus optimizer * docs: README
1 parent 74db1eb commit 22abf5a

File tree

14 files changed

+357
-101
lines changed

14 files changed

+357
-101
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
## The reasons why you use `pytorch-optimizer`.
1212

13-
* Wide range of supported optimizers. Currently, **109 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13+
* Wide range of supported optimizers. Currently, **110 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
1414
* Including many variants such as `ADOPT`, `Cautious`, `AdamD`, `StableAdamW`, and `Gradient Centrailiaztion`
1515
* Easy to use, clean, and tested codes
1616
* Active maintenance
@@ -217,6 +217,7 @@ get_supported_optimizers(['adam*', 'ranger*'])
217217
| SNSM | *Subset-Norm and Subspace-Momentum: Faster Memory-Efficient Adaptive Optimization with Convergence Guarantees* | [github](https://github.com/timmytonga/sn-sm) | <https://arxiv.org/abs/2411.07120> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241107120N/exportcitation) |
218218
| AdamC | *Why Gradients Rapidly Increase Near the End of Training* | | <https://arxiv.org/abs/2506.02285> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250602285D/exportcitation) |
219219
| AdaMuon | *Adaptive Muon Optimizer* | | <https://arxiv.org/abs/2507.11005v1> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250711005S/exportcitation) |
220+
| SPlus | *A Stable Whitening Optimizer for Efficient Neural Network Training* | [github](https://github.com/kvfrans/splus) | <https://arxiv.org/abs/2506.07254> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250607254F/exportcitation) |
220221

221222
## Supported LR Scheduler
222223

docs/changelogs/v3.6.2.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,8 @@
44

55
* Implement `AdaMuon` optimizer. (#394, #395)
66
* [Adaptive Muon Optimizer](https://arxiv.org/abs/2507.11005v1)
7+
* Implement `SPlus` optimizer. (#396, #399)
8+
* [A Stable Whitening Optimizer for Efficient Neural Network Training](https://arxiv.org/abs/2506.07254)
79

810
### Fix
911

docs/index.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
## The reasons why you use `pytorch-optimizer`.
1212

13-
* Wide range of supported optimizers. Currently, **108 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13+
* Wide range of supported optimizers. Currently, **110 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
1414
* Including many variants such as `ADOPT`, `Cautious`, `AdamD`, `StableAdamW`, and `Gradient Centrailiaztion`
1515
* Easy to use, clean, and tested codes
1616
* Active maintenance
@@ -215,7 +215,9 @@ get_supported_optimizers(['adam*', 'ranger*'])
215215
| RACS & Alice | *Towards Efficient Optimizer Design for LLM via Structured Fisher Approximation with a Low-Rank Extension* | | <https://arxiv.org/pdf/2502.07752> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250207752G/exportcitation) |
216216
| VSGD | *Variational Stochastic Gradient Descent for Deep Neural Networks* | [github](https://github.com/generativeai-tue/vsgd) | <https://openreview.net/forum?id=xu4ATNjcdy> | [cite](https://github.com/generativeai-tue/vsgd/tree/main?tab=readme-ov-file#cite) |
217217
| SNSM | *Subset-Norm and Subspace-Momentum: Faster Memory-Efficient Adaptive Optimization with Convergence Guarantees* | [github](https://github.com/timmytonga/sn-sm) | <https://arxiv.org/abs/2411.07120> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241107120N/exportcitation) |
218-
| AdamC | Why Gradients Rapidly Increase Near the End of Training* | | <https://arxiv.org/abs/2506.02285> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250602285D/exportcitation) |
218+
| AdamC | *Why Gradients Rapidly Increase Near the End of Training* | | <https://arxiv.org/abs/2506.02285> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250602285D/exportcitation) |
219+
| AdaMuon | *Adaptive Muon Optimizer* | | <https://arxiv.org/abs/2507.11005v1> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250711005S/exportcitation) |
220+
| SPlus | *A Stable Whitening Optimizer for Efficient Neural Network Training* | [github](https://github.com/kvfrans/splus) | <https://arxiv.org/abs/2506.07254> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250607254F/exportcitation) |
219221

220222
## Supported LR Scheduler
221223

docs/optimizer.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -432,6 +432,10 @@
432432
:docstring:
433433
:members:
434434

435+
::: pytorch_optimizer.SPlus
436+
:docstring:
437+
:members:
438+
435439
::: pytorch_optimizer.SRMM
436440
:docstring:
437441
:members:

0 commit comments

Comments
 (0)