You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
12
+
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13
+
Currently, **60 optimizers (+ `bitsandbytes`)**, **10 lr schedulers**, and **13 loss functions** are supported!
14
+
15
+
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
For more, see the [documentation](https://pytorch-optimizers.readthedocs.io/en/latest/).
26
20
27
-
Most optimizers are under MIT or Apache 2.0 license, but a few
28
-
optimizers like <spanclass="title-ref">Fromage</span>, <span
29
-
class="title-ref">Nero</span> have BY-NC-SA 4.0 license, which is
30
-
non-commercial. So, please double-check the license before using it at
31
-
your work.
21
+
Most optimizers are under MIT or Apache 2.0 license, but a few optimizers like `Fromage`, `Nero` have `CC BY-NC-SA 4.0 license`, which is non-commercial.
22
+
So, please double-check the license before using it at your work.
32
23
33
24
### Installation
34
25
35
-
```bash
36
-
$ pip3 install -U pytorch-optimizer
26
+
```bash
27
+
$ pip3 install pytorch-optimizer
37
28
```
38
29
39
-
If there's a version issue when installing the package, try with <span
40
-
class="title-ref">--no-deps</span> option.
30
+
From `pytorch-optimizer v2.12.0`, you can install and import `bitsandbytes` optimizers.
31
+
please check [the requirements](https://github.com/TimDettmers/bitsandbytes?tab=readme-ov-file#tldr) before installing it.
|[On the Convergence of Adam and Beyond](#on-the-convergence-of-adam-and-beyond)|[Improved bias-correction in Adam](#improved-bias-correction-in-adam)|[Adaptive Gradient Norm Correction](#adaptive-gradient-norm-correction)|
216
209
217
-
## Adaptive Gradient Clipping
210
+
###Adaptive Gradient Clipping
218
211
219
-
This idea originally proposed in `NFNet (Normalized-Free Network)`
220
-
paper.
221
-
`AGC (Adaptive Gradient Clipping)` clips gradients based on the
222
-
`unit-wise ratio of gradient norms to parameter norms`.
212
+
This idea originally proposed in `NFNet (Normalized-Free Network)` paper. `AGC (Adaptive Gradient Clipping)` clips gradients based on the `unit-wise ratio of gradient norms to parameter norms`.
* paper : [arXiv](https://arxiv.org/abs/2003.03977)
295
278
296
-
## Lookahead
279
+
###Lookahead
297
280
298
-
`k` steps forward, 1 step back. `Lookahead` consisting of keeping an
299
-
exponential moving average of the weights that is
300
-
updated and substituted to the current weights every `k_{lookahead}`
301
-
steps (5 by default).
281
+
`k` steps forward, 1 step back. `Lookahead` consisting of keeping an exponential moving average of the weights that is updated and substituted to the current weights every `k` lookahead steps (5 by default).
302
282
303
-
## Chebyshev learning rate schedule
283
+
###Chebyshev learning rate schedule
304
284
305
285
Acceleration via Fractal Learning Rate Schedules.
306
286
307
-
## (Adaptive) Sharpness-Aware Minimization
287
+
### (Adaptive) Sharpness-Aware Minimization
288
+
289
+
Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
290
+
In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
291
+
292
+
### On the Convergence of Adam and Beyond
308
293
309
-
Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value
310
-
and loss sharpness.
311
-
In particular, it seeks parameters that lie in neighborhoods having
312
-
uniformly low loss.
294
+
Convergence issues can be fixed by endowing such algorithms with 'long-term memory' of past gradients.
313
295
314
-
##On the Convergence of Adam and Beyond
296
+
### Improved bias-correction in Adam
315
297
316
-
Convergence issues can be fixed by endowing such algorithms with
317
-
'long-term memory' of past gradients.
298
+
With the default bias-correction, Adam may actually make larger than requested gradient updates early in training.
318
299
319
-
##Improved bias-correction in Adam
300
+
### Adaptive Gradient Norm Correction
320
301
321
-
With the default bias-correction, Adam may actually make larger than
322
-
requested gradient updates early in training.
302
+
Correcting the norm of a gradient in each iteration based on the adaptive training history of gradient norm.
323
303
324
-
## Adaptive Gradient Norm Correction
304
+
## Frequently asked questions
325
305
326
-
Correcting the norm of a gradient in each iteration based on the
327
-
adaptive training history of gradient norm.
306
+
[here](./qa.md)
328
307
329
308
## Citation
330
309
331
-
Please cite the original authors of optimization algorithms. You can
332
-
easily find it in the above table! If you use this software, please cite
333
-
it below. Or you can get it from "cite this repository" button.
310
+
Please cite the original authors of optimization algorithms. You can easily find it in the above table!
311
+
If you use this software, please cite it below. Or you can get it from "cite this repository" button.
334
312
335
-
@software{Kim_pytorch_optimizer_optimizer_2022,
313
+
@software{Kim_pytorch_optimizer_optimizer_2021,
336
314
author = {Kim, Hyeongchan},
337
315
month = jan,
338
316
title = {{pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch}},
0 commit comments