Releases: kozistr/pytorch_optimizer
Releases · kozistr/pytorch_optimizer
pytorch-optimizer v2.4.0
Change Log
Feature
- Implement
D-Adaptation optimizers(DAdaptAdaGrad,DAdaptAdam,DAdaptSGD), #101- Learning rate free learning for SGD, AdaGrad and Adam
- original implementation: https://github.com/facebookresearch/dadaptation
- Shampoo optimizer
- Support
no_preconditioning_for_layers_with_dim_gt(default 8192)
- Support
Improvement
- refactor/improve
matrix_power(), unroll the loop due to the performance, #101 - speed-up/fix
power_iter(), not to deep-copymat_v. #101
Docs
D-Adaptation optimizers& Shampoo utils
pytorch-optimizer v2.3.1
Change Log
Feature
- more add-ons for Shampoo optimizer, #99
- implement
moving_average_for_momentum - implement
decoupled_weight_decay - implement
decoupled_learning_rate - supports more grafting (
RMSProp,SQRT_N) - supports more PreConditioner (
ALL,INPUT)
- implement
Docs
- apply pydocstyle linter, #91
Refactor
- deberta_v3_large_lr_scheduler, #91
ETC
- add more Ruff rules (
ICN, TID, ERA, RUF, YTT, PL), #91
pytorch-optimizer v2.3.0
Change Log
Feature
- re-implement Shampoo Optimizer (#97, related to #93)
- layer-wise grafting (none, adagrad, sgd)
- block partitioner
- preconditioner
- remove casting to
fp16orbf16inside of thestep()not to lose consistency with the other optimizers. #96 - change some ops to in-place operations to speed up. #96
Fix
- fix
exp_avg_varwhenamsgradis True. #96
Refactor
- change linter from
PylinttoRuff, #97
pytorch-optimizer v2.2.1
Change Log
Feature
- Support
max_grad_norm(Adan optimizer) - Support
gradient averaging(Lamb optimizer) - Support
dampening,nesterovparameters (Lars optimizer)
Refactor
- move
stepparameter fromstatetogroup. (to reduce computation cost & memory) - load
betasbygroup, not a parameter. - change to in-place operations.
Fix
- fix when
momentumis 0 (Lars optimizer)
pytorch-optimizer v2.2.0
Change Log
- Implement GSAM (Surrogate Gap Guided Sharpness-Aware Minimization) optimizer, ICLR 22
pytorch-optimizer v2.1.1
Change Log
Feature
- Support
gradient centralizationforAdaioptimizer - Support
AdamD debiasforAdaPNMoptimizer - Register custom exceptions (e.g. NoSparseGradientError, NoClosureError, ...)
Documentation
- Add API documentation
Bug
- Fix
SAMoptimizer
pytorch-optimizer v2.1.0
pytorch-optimizer v2.0.1
pytorch-optimizer v2.0.0
Chage Log
- Refactor the package depth
- 4 depths
pytorch_optimizer.lr_scheduler: lr schedulerspytorch_optimizer.optimizer: optimizerspytorch_optimizer.base: base utilspytorch_optimizer.experimental: any experimental features
pytorch_optimizer.adamp->pytorch_optimizer.optimizer.adamp- Still
from pytorch_optimizer import AdamPis possible
- 4 depths
- Implement lr schedulers
-
CosineAnealingWarmupRestarts
-
- Implement (experimental) lr schedulers
-
DeBERTaV3-largelayer-wise lr scheduler
-
Other changes (bug fixes, small refactors)
- Fix
AGC(to returning the parameter) - Make a room for
experimental features(atpytorch_optimizer.experimental) - base types