Skip to content

Commit 4b439ab

Browse files
authored
Merge pull request #337 from kozistr/feature/psgd-optimizer
[Feature] implement PSGD Kron optimizer
2 parents c7496b0 + 68a02f8 commit 4b439ab

22 files changed

+759
-92
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
## The reasons why you use `pytorch-optimizer`.
1212

13-
* Wide range of supported optimizers. Currently, **93 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13+
* Wide range of supported optimizers. Currently, **94 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
1414
* Including many variants such as `ADOPT`, `Cautious`, `AdamD`, `StableAdamW`, and `Gradient Centrailiaztion`
1515
* Easy to use, clean, and tested codes
1616
* Active maintenance
@@ -201,6 +201,7 @@ get_supported_optimizers(['adam*', 'ranger*'])
201201
| SPAM | *Spike-Aware Adam with Momentum Reset for Stable LLM Training* | [github](https://github.com/TianjinYellow/SPAM-Optimizer) | <https://arxiv.org/abs/2501.06842> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250106842H/exportcitation) |
202202
| TAM | *Torque-Aware Momentum* | | <https://arxiv.org/abs/2412.18790> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241218790M/exportcitation) |
203203
| FOCUS | *First Order Concentrated Updating Scheme* | [github](https://github.com/liuyz0/FOCUS) | <https://arxiv.org/abs/2501.12243> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250112243M/exportcitation) |
204+
| PSGD | *Preconditioned Stochastic Gradient Descent* | [github](https://github.com/lixilinx/psgd_torch) | <https://arxiv.org/abs/1512.04202> | [cite](https://github.com/lixilinx/psgd_torch?tab=readme-ov-file#resources) |
204205

205206
## Supported LR Scheduler
206207

docs/changelogs/v3.3.5.md

Lines changed: 0 additions & 20 deletions
This file was deleted.

docs/changelogs/v3.4.0.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
### Change Log
2+
3+
### Feature
4+
5+
* Implement `FOCUS` optimizer. (#330, #331)
6+
* [First Order Concentrated Updating Scheme](https://arxiv.org/abs/2501.12243)
7+
* Implement `PSGD Kron`. (#337)
8+
* [preconditioned stochastic gradient descent w/ Kron pre-conditioner](https://arxiv.org/abs/1512.04202)
9+
10+
### Update
11+
12+
* Support `OrthoGrad` variant to `Ranger25`. (#332)
13+
* `Ranger25` optimizer is my experimental-crafted optimizer, which mixes lots of optimizer variants such as `ADOPT` + `AdEMAMix` + `Cautious` + `StableAdamW` + `Adam-Atan2` + `OrthoGrad`.
14+
15+
### Fix
16+
17+
* Add the missing `state` property in `OrthoGrad` optimizer. (#326, #327)
18+
* Add the missing `state_dict`, and `load_state_dict` methods to `TRAC` and `OrthoGrad` optimizers. (#332)
19+
* Skip when the gradient is sparse in `OrthoGrad` optimizer. (#332)
20+
* Support alternative precision training in `SOAP` optimizer. (#333)
21+
* Store SOAP condition matrices as the dtype of their parameters. (#335)
22+
23+
### Contributions
24+
25+
thanks to @Vectorrent, @kylevedder

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010

1111
## The reasons why you use `pytorch-optimizer`.
1212

13-
* Wide range of supported optimizers. Currently, **93 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
13+
* Wide range of supported optimizers. Currently, **94 optimizers (+ `bitsandbytes`, `qgalore`, `torchao`)**, **16 lr schedulers**, and **13 loss functions** are supported!
1414
* Including many variants such as `ADOPT`, `Cautious`, `AdamD`, `StableAdamW`, and `Gradient Centrailiaztion`
1515
* Easy to use, clean, and tested codes
1616
* Active maintenance
@@ -201,6 +201,7 @@ get_supported_optimizers(['adam*', 'ranger*'])
201201
| SPAM | *Spike-Aware Adam with Momentum Reset for Stable LLM Training* | [github](https://github.com/TianjinYellow/SPAM-Optimizer) | <https://arxiv.org/abs/2501.06842> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250106842H/exportcitation) |
202202
| TAM | *Torque-Aware Momentum* | | <https://arxiv.org/abs/2412.18790> | [cite](https://ui.adsabs.harvard.edu/abs/2024arXiv241218790M/exportcitation) |
203203
| FOCUS | *First Order Concentrated Updating Scheme* | [github](https://github.com/liuyz0/FOCUS) | <https://arxiv.org/abs/2501.12243> | [cite](https://ui.adsabs.harvard.edu/abs/2025arXiv250112243M/exportcitation) |
204+
| PSGD | *Preconditioned Stochastic Gradient Descent* | [github](https://github.com/lixilinx/psgd_torch) | <https://arxiv.org/abs/1512.04202> | [cite](https://github.com/lixilinx/psgd_torch?tab=readme-ov-file#resources) |
204205

205206
## Supported LR Scheduler
206207

docs/optimizer.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -284,6 +284,10 @@
284284
:docstring:
285285
:members:
286286

287+
::: pytorch_optimizer.Kron
288+
:docstring:
289+
:members:
290+
287291
::: pytorch_optimizer.QHAdam
288292
:docstring:
289293
:members:

0 commit comments

Comments
 (0)