Releases · kozistr/pytorch_optimizer · GitHub

06 May 08:07

kozistr

pytorch-optimizer v2.9.0

Change Log

Feature

Implement AdaMax optimizer, #148
- A variant of Adam based on the infinity norm
Implement Gravity optimizer, #151
- a Kinematic Approach on Optimization in Deep Learning
Implement AdaSmooth optimizer, #153
- An Adaptive Learning Rate Method based on Effective Ratio
Implement SRMM optimizer, #154
- Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates
Implement AvaGrad optimizer, #155
- Domain-independent Dominance of Adaptive Methods
Implement AdaShift optimizer, #157
- Decorrelation and Convergence of Adaptive Learning Rate Methods
Upgrade to D-Adaptation v3, #158, #159
Implement AdaDelta optimizer, #160
- An Adaptive Learning Rate Method

Docs

Fix readthedocs build issue, #156
Move citations into table, #156

Refactor

Refactor validation logic, #149, #150
Rename amsbound, amsgrad terms into ams_bound, #149
Return gradient instead of the parameter, AGC. #149
Refactor duplicates (e.g. rectified step size, AMSBound, AdamD, AdaNorm, weight decay) into re-usable functions, #150
Move pytorch_optimizer.experimental under pytorch_optimizer.*.experimental

Diff

Assets 2

29 Apr 08:51

kozistr

pytorch-optimizer v2.8.0

Change Log

Feature

Implement A2Grad optimizer, #136
- Optimal Adaptive and Accelerated Stochastic Gradient Descent
Implement Accelerated SGD optimizer, #137
- Accelerating Stochastic Gradient Descent For Least Squares Regression
Implement Adaptive SGD optimizer, #139
- Adaptive Gradient Descent without Descent
Implement SGDW optimizer, #139
- Decoupled Weight Decay Regularization
Implement Yogi optimizer, #140
- Adaptive Methods for Nonconvex Optimization
Implement SWATS optimizer, #141
- Improving Generalization Performance by Switching from Adam to SGD
Implement Fromage optimizer, #142
- On the distance between two neural networks and the stability of learning
Implement MSVAG optimizer, #143
- Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Implement AdaMod optimizer, #144
- An Adaptive and Momental Bound Method for Stochastic Learning
Implement AggMo optimizer, #145
- Aggregated Momentum: Stability Through Passive Damping
Implement QHAdam, QHM optimizers, #146
- Quasi-hyperbolic momentum and Adam for deep learning
Implement PID optimizer, #147
- A PID Controller Approach for Stochastic Optimization of Deep Networks

Bug

Fix update in Lion optimizer, #135
Fix momentum_buffer in SGDP optimizer, #139

Assets 2

26 Apr 06:31

kozistr

pytorch-optimizer v2.7.0

Change Log

Features

Implement AdaNorm optimizer, #133
- AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs
Implement RotoGrad optimizer, #124, #134
- RotoGrad: Gradient Homogenization in Multitask Learning
Implement D-Adapt Adan optimizer, #134
Support AdaNorm variant, #133, #134
- AdaBelief
- AdamP
- AdamS
- AdaPNM
- diffGrad
- Lamb
- RAdam
- Ranger
- Adan
Support AMSGrad variant, #133, #134
- diffGrad
- AdaFactor
Support degenerated_to_sgd, #133
- Ranger
- Lamb

Refactor

Rename adamd_debias_term to adam_debias, #133
Merge the rectified version with the original, #133
- diffRGrad + diffGrad -> diffGrad
- RaLamb + Lamb -> Lamb
- now you can simply use with rectify=True

Fix

Fix previous_grad deepcopy issue in Adan optimizer. #134

Assets 2

22 Apr 12:14

kozistr

pytorch-optimizer v2.6.1

Change Log

Fix

variables are not located on the same device with the gradients, #132 (related to #131) (thanks to @Bing-su)
fix approximate_sq_grad() in Adafactor optimizer, #132

Contributors

Bing-su

Assets 2

22 Apr 07:56

kozistr

pytorch-optimizer v2.6.0

Change Log

Feature

Implement SM3 optimizer, #130
- Memory-Efficient Adaptive Optimization
Tweak Scalable Shampoo optimizer, #128, #129
- implement a new preconditioner type, OUTPUT.
- optimize speed/memory usage of coupled Newton iteration and power iteration methods.
- use in-place operation (SQRT-N Grafting).
- clean-up shampoo_utils more readable.
- support skip_preconditioning_rank_lt parameter to skip preconditioning in case of the low-rank gradient.
- set default value for preconditioning_compute_steps to 1000.
- set default value for start_preconditioning_step to 25.

Assets 2

11 Apr 13:47

kozistr

pytorch-optimizer v2.5.2

Feature

add eps to stabilize optimizing, Nero optimizer. #121

Fix

fix Ranger21 not to skip updates when the first parameter doesn't have a gradient, #125, #126 (thanks to @jdb78)
fix Lookahead optimizer, #122, #123

Dependency

upgrade to Pytorch 2.0, #123

Contributors

jdb78

Assets 2

12 Mar 05:48

kozistr

pytorch-optimizer v2.5.1

Change Log

Feature

Implement Ali-G optimizer, #115, #116
- Adaptive Learning Rates for Interpolation with Gradients
Implement create_optimizer() to build the optimizer, #116

Bug

__str__ method, #118, #119 (thanks to @Interpause)

Contributors

Interpause

Assets 2

15 Feb 05:41

kozistr

pytorch-optimizer v2.5.0

Change Log

Feature

Implement AdaFactor optimizer, #107
- Adaptive Learning Rates with Sublinear Memory Cost
Implement NovoGrad optimizer, #109
- Stochastic Gradient Methods with Layer-wise Adaptive Moments for Training of Deep Networks
Implement Apollo optimizer, #108
- An Adaptive Parameter-wise Diagonal Quasi-Newton Method for Nonconvex Stochastic Optimization
Implement Lion optimizer, #113
- Symbolic Discovery of Optimization Algorithms

Assets 2

10 Feb 10:57

kozistr

pytorch-optimizer v2.4.2

Change Log

Bug

Fix to deep-copy inverse preconditioners

Deps

Support Pytorch 2.0, #106 (related to #105)

Docs

Update Scalable Shampoo docstring (more parameter guides), #106
- documentation : https://pytorch-optimizers.readthedocs.io/en/latest/optimizer_api.html#scalableshampoo

Assets 2

06 Feb 06:34

kozistr

pytorch-optimizer v2.4.1

Change Log

Feature

Rename the new Shampoo to ScalableShampoo. #103
Implement the old(?) version of Shampoo optimizer. #103
Support SVD method to calculate the inverse pth root matrix. #103
- to boost the M^{-1/p} calculation, performs batched SVD when available.
Implement AdamS optimizer. #102
Support stable weight decay option for Adai optimizer. #102

Bug

Fix compute_power_svd() to get a singular value. #104

Assets 2