Skip to content

Commit eb69975

Browse files
authored
Merge pull request #163 from kozistr/refactor/codes
[Refactor] codes
2 parents 75202dc + d1171da commit eb69975

File tree

16 files changed

+177
-78
lines changed

16 files changed

+177
-78
lines changed

.github/workflows/publish.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ jobs:
2020
with:
2121
tag_name: ${{ github.ref }}
2222
release_name: pytorch-optimizer ${{ github.ref }}
23+
body_path: docs/changelog/${{ github.ref }}.md
2324
draft: false
2425
prerelease: false
2526
deploy:

docs/changelogs/v2.7.0.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
## Change Log
2+
3+
### Feature
4+
5+
* Implement `AdaNorm` optimizer (#133)
6+
* [AdaNorm: Adaptive Gradient Norm Correction based Optimizer for CNNs](https://arxiv.org/abs/2210.06364)
7+
* Implement `RotoGrad` optimizer (#124, #134)
8+
* [RotoGrad: Gradient Homogenization in Multitask Learning](https://arxiv.org/abs/2103.02631)
9+
* Implement `D-Adapt Adan` optimizer (#134)
10+
* Support `AdaNorm` variant (#133, #134)
11+
* AdaBelief
12+
* AdamP
13+
* AdamS
14+
* AdaPNM
15+
* diffGrad
16+
* Lamb
17+
* RAdam
18+
* Ranger
19+
* Adan
20+
* Support `AMSGrad` variant (#133, #134)
21+
* diffGrad
22+
* AdaFactor
23+
* Support `degenerated_to_sgd` (#133)
24+
* Ranger
25+
* Lamb
26+
27+
### Refactor
28+
29+
* Rename `adamd_debias_term` to `adam_debias` (#133)
30+
* Merge the rectified version with the original (#133)
31+
* diffRGrad + diffGrad -> diffGrad
32+
* RaLamb + Lamb -> Lamb
33+
* now you can simply use with `rectify=True`
34+
35+
### Bug
36+
37+
* Fix `previous_grad` deepcopy issue in Adan optimizer (#134)
38+
39+
### Diff
40+
41+
[2.6.1...2.7.0](https://github.com/kozistr/pytorch_optimizer/compare/v2.6.1...v2.7.0)

docs/changelogs/v2.8.0.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
## Change Log
2+
3+
### Feature
4+
5+
* Implement A2Grad optimizer (#136)
6+
* [Optimal Adaptive and Accelerated Stochastic Gradient Descent](https://arxiv.org/abs/1810.00553)
7+
* Implement Accelerated SGD optimizer (#137)
8+
* [Accelerating Stochastic Gradient Descent For Least Squares Regression](https://arxiv.org/abs/1704.08227)
9+
* Implement Adaptive SGD optimizer (#139)
10+
* [Adaptive Gradient Descent without Descent](https://arxiv.org/abs/1910.09529)
11+
* Implement SGDW optimizer (#139)
12+
* [Decoupled Weight Decay Regularization](https://arxiv.org/abs/1711.05101)
13+
* Implement Yogi optimizer (#140)
14+
* [Adaptive Methods for Nonconvex Optimization](https://papers.nips.cc/paper_files/paper/2018/hash/90365351ccc7437a1309dc64e4db32a3-Abstract.html)
15+
* Implement SWATS optimizer (#141)
16+
* [Improving Generalization Performance by Switching from Adam to SGD](https://arxiv.org/abs/1712.07628)
17+
* Implement Fromage optimizer (#142)
18+
* [On the distance between two neural networks and the stability of learning](https://arxiv.org/abs/2002.03432)
19+
* Implement MSVAG optimizer (#143)
20+
* [Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients](https://arxiv.org/abs/1705.07774)
21+
* Implement AdaMod optimizer (#144)
22+
* [An Adaptive and Momental Bound Method for Stochastic Learning](https://arxiv.org/abs/1910.12249)
23+
* Implement AggMo optimizer (#145)
24+
* [Aggregated Momentum: Stability Through Passive Damping](https://arxiv.org/abs/1804.00325)
25+
* Implement QHAdam, QHM optimizers (#146)
26+
* [Quasi-hyperbolic momentum and Adam for deep learning](https://arxiv.org/abs/1810.06801)
27+
* Implement PID optimizer (#147)
28+
* [A PID Controller Approach for Stochastic Optimization of Deep Networks](http://www4.comp.polyu.edu.hk/~cslzhang/paper/CVPR18_PID.pdf)
29+
30+
### Bug
31+
32+
* Fix `update` in Lion optimizer (#135)
33+
* Fix `momentum_buffer` in SGDP optimizer (#139)
34+
35+
### Diff
36+
37+
[2.7.0...2.8.0](https://github.com/kozistr/pytorch_optimizer/compare/v2.7.0...v2.8.0)

docs/changelogs/v2.9.0.md

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
## Change Log
2+
3+
### Feature
4+
5+
* Implement AdaMax optimizer (#148)
6+
* A variant of Adam based on the infinity norm
7+
* Implement Gravity optimizer (#151)
8+
* [a Kinematic Approach on Optimization in Deep Learning](https://arxiv.org/abs/2101.09192)
9+
* Implement AdaSmooth optimizer (#153)
10+
* [An Adaptive Learning Rate Method based on Effective Ratio](https://arxiv.org/abs/2204.00825v1)
11+
* Implement SRMM optimizer (#154)
12+
* [Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates](https://arxiv.org/abs/2201.01652)
13+
* Implement AvaGrad optimizer (#155)
14+
* [Domain-independent Dominance of Adaptive Methods](https://arxiv.org/abs/1912.01823)
15+
* Implement AdaShift optimizer (#157)
16+
* [Decorrelation and Convergence of Adaptive Learning Rate Methods](https://arxiv.org/abs/1810.00143v4)
17+
* Upgrade to D-Adaptation v3 (#158, #159)
18+
* Implement AdaDelta optimizer (#160)
19+
* [An Adaptive Learning Rate Method](https://arxiv.org/abs/1212.5701v1)
20+
21+
### Docs
22+
23+
* Fix readthedocs build issue (#156)
24+
* Move citations into table (#156)
25+
26+
### Refactor
27+
28+
* Refactor validation logic (#149, #150)
29+
* Rename `amsbound`, `amsgrad` terms into `ams_bound` (#149)
30+
* Return gradient instead of the parameter, AGC. (#149)
31+
* Refactor duplicates (e.g. rectified step size, AMSBound, AdamD, AdaNorm, weight decay) into re-usable functions (#150)
32+
* Move `pytorch_optimizer.experimental` under `pytorch_optimizer.*.experimental`
33+
34+
### Diff
35+
36+
[2.8.0...2.9.0](https://github.com/kozistr/pytorch_optimizer/compare/v2.8.0...v2.9.0)

pytorch_optimizer/optimizer/adai.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,7 +103,7 @@ def step(self, closure: CLOSURE = None) -> LOSS:
103103
state['step'] += 1
104104

105105
if self.use_gc:
106-
grad = centralize_gradient(grad, gc_conv_only=False)
106+
centralize_gradient(grad, gc_conv_only=False)
107107

108108
bias_correction2: float = 1.0 - beta2 ** state['step']
109109

pytorch_optimizer/optimizer/adamp.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,7 @@ def step(self, closure: CLOSURE = None) -> LOSS:
122122
state['exp_grad_norm'] = torch.zeros((1,), dtype=grad.dtype, device=grad.device)
123123

124124
if self.use_gc:
125-
grad = centralize_gradient(grad, gc_conv_only=False)
125+
centralize_gradient(grad, gc_conv_only=False)
126126

127127
s_grad = self.get_adanorm_gradient(
128128
grad=grad,

pytorch_optimizer/optimizer/adan.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ def step(self, closure: CLOSURE = None) -> LOSS:
130130
grad.mul_(clip_global_grad_norm)
131131

132132
if self.use_gc:
133-
grad = centralize_gradient(grad, gc_conv_only=False)
133+
centralize_gradient(grad, gc_conv_only=False)
134134

135135
grad_diff = state['previous_grad']
136136
grad_diff.add_(grad)

pytorch_optimizer/optimizer/gc.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
import torch
22

33

4-
def centralize_gradient(x: torch.Tensor, gc_conv_only: bool = False) -> torch.Tensor:
4+
def centralize_gradient(x: torch.Tensor, gc_conv_only: bool = False):
55
r"""Gradient Centralization (GC).
66
77
:param x: torch.Tensor. gradient.
88
:param gc_conv_only: bool. 'False' for both conv & fc layers.
9-
:return: torch.Tensor. centralized gradient.
109
"""
1110
size: int = x.dim()
1211
if (gc_conv_only and size > 3) or (not gc_conv_only and size > 1):
1312
x.add_(-x.mean(dim=tuple(range(1, size)), keepdim=True))
14-
return x

pytorch_optimizer/optimizer/lion.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ def step(self, closure: CLOSURE = None) -> LOSS:
9090
state['exp_grad_norm'] = torch.zeros((1,), dtype=grad.dtype, device=grad.device)
9191

9292
if self.use_gc:
93-
grad = centralize_gradient(grad, gc_conv_only=False)
93+
centralize_gradient(grad, gc_conv_only=False)
9494

9595
self.apply_weight_decay(
9696
p=p,

pytorch_optimizer/optimizer/ranger.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -140,7 +140,7 @@ def step(self, closure: CLOSURE = None) -> LOSS:
140140
state['exp_grad_norm'] = torch.zeros((1,), dtype=grad.dtype, device=grad.device)
141141

142142
if self.use_gc and grad.dim() > self.gc_gradient_threshold:
143-
grad = centralize_gradient(grad, gc_conv_only=False)
143+
centralize_gradient(grad, gc_conv_only=False)
144144

145145
self.apply_weight_decay(
146146
p=p,

0 commit comments

Comments
 (0)