Skip to content

Commit 14b6b58

Browse files
authored
Merge pull request #211 from kozistr/feature/bitsandbytes
[Feature] Support `bitsandbytes` optimizers
2 parents 1c82216 + c6fbd24 commit 14b6b58

File tree

13 files changed

+612
-641
lines changed

13 files changed

+612
-641
lines changed

CITATION.cff

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,6 @@ authors:
55
given-names: Hyeongchan
66
orcid: https://orcid.org/0000-0002-1729-0580
77
title: "pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch"
8-
version: 2.11.0
9-
date-released: 2022-01-29
8+
version: 2.12.0
9+
date-released: 2021-09-21
1010
url: "https://github.com/kozistr/pytorch_optimizer"

README.md

Lines changed: 324 additions & 0 deletions
Large diffs are not rendered by default.

README.rst

Lines changed: 0 additions & 448 deletions
This file was deleted.

docs/changelogs/v2.12.0.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
## Change Log
2+
3+
### Feature
4+
5+
* Support `bitsandbytes` optimizer. (#211)
6+
* now, you can install with `pip3 install pytorch-optimizer[bitsandbytes]`
7+
* supports 8 bnb optimizers.
8+
* `bnb_adagrad8bit`, `bnb_adam8bit`, `bnb_adamw8bit`, `bnb_lion8bit`, `bnb_lamb8bit`, `bnb_lars8bit`, `bnb_rmsprop8bit`, `bnb_sgd8bit`.
9+
10+
### Docs
11+
12+
* Introduce `mkdocs` with `material` theme. (#204, #206)
13+
* documentation : https://pytorch-optimizers.readthedocs.io/en/latest/
14+
15+
### Diff
16+
17+
[2.11.2...2.12.0](https://github.com/kozistr/pytorch_optimizer/compare/v2.11.2...v2.12.0)

docs/index.md

Lines changed: 77 additions & 99 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
# Welcome to pytorch-optimizer
1+
# pytorch-optimizer
22

33
| | |
44
|---------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
@@ -8,44 +8,35 @@
88
| Status | [![PyPi download](https://static.pepy.tech/badge/pytorch-optimizer)](https://pepy.tech/project/pytorch-optimizer) [![PyPi month download](https://static.pepy.tech/badge/pytorch-optimizer/month)](https://pepy.tech/project/pytorch-optimizer) |
99
| License | [![apache](https://img.shields.io/badge/License-Apache_2.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) |
1010

11-
**pytorch-optimizer** is optimizer & lr scheduler collections in
12-
PyTorch.
13-
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm
14-
while based on the original paper. Also, It includes useful and
15-
practical optimization ideas.
16-
Currently, **60 optimizers**, **10 lr schedulers**, and **13 loss
17-
functions** are supported!
18-
19-
Highly inspired by
20-
[pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
11+
**pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
12+
I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
13+
Currently, **60 optimizers (+ `bitsandbytes`)**, **10 lr schedulers**, and **13 loss functions** are supported!
14+
15+
Highly inspired by [pytorch-optimizer](https://github.com/jettify/pytorch-optimizer).
2116

2217
## Getting Started
2318

24-
For more, see the
25-
[documentation](https://pytorch-optimizers.readthedocs.io/en/latest/).
19+
For more, see the [documentation](https://pytorch-optimizers.readthedocs.io/en/latest/).
2620

27-
Most optimizers are under MIT or Apache 2.0 license, but a few
28-
optimizers like <span class="title-ref">Fromage</span>, <span
29-
class="title-ref">Nero</span> have BY-NC-SA 4.0 license, which is
30-
non-commercial. So, please double-check the license before using it at
31-
your work.
21+
Most optimizers are under MIT or Apache 2.0 license, but a few optimizers like `Fromage`, `Nero` have `CC BY-NC-SA 4.0 license`, which is non-commercial.
22+
So, please double-check the license before using it at your work.
3223

3324
### Installation
3425

35-
``` bash
36-
$ pip3 install -U pytorch-optimizer
26+
```bash
27+
$ pip3 install pytorch-optimizer
3728
```
3829

39-
If there's a version issue when installing the package, try with <span
40-
class="title-ref">--no-deps</span> option.
30+
From `pytorch-optimizer v2.12.0`, you can install and import `bitsandbytes` optimizers.
31+
please check [the requirements](https://github.com/TimDettmers/bitsandbytes?tab=readme-ov-file#tldr) before installing it.
4132

42-
``` bash
43-
$ pip3 install -U --no-deps pytorch-optimizer
33+
```bash
34+
$ pip install "pytorch-optimizer[bitsandbytes]"
4435
```
4536

4637
### Simple Usage
4738

48-
``` python
39+
```python
4940
from pytorch_optimizer import AdamP
5041

5142
model = YourModel()
@@ -55,26 +46,29 @@ optimizer = AdamP(model.parameters())
5546

5647
from pytorch_optimizer import load_optimizer
5748

58-
model = YourModel()
59-
opt = load_optimizer(optimizer='adamp')
49+
optimizer = load_optimizer(optimizer='adamp')(model.parameters())
50+
51+
# if you install `bitsandbytes` optimizer, you can use `8-bit` optimizers from `pytorch-optimizer`.
52+
53+
from pytorch_optimizer import load_optimizer
54+
55+
opt = load_optimizer(optimizer='bnb_adamw8bit')
6056
optimizer = opt(model.parameters())
6157
```
6258

63-
Also, you can load the optimizer via <span
64-
class="title-ref">torch.hub</span>
59+
Also, you can load the optimizer via `torch.hub`.
6560

66-
``` python
61+
```python
6762
import torch
6863

6964
model = YourModel()
7065
opt = torch.hub.load('kozistr/pytorch_optimizer', 'adamp')
7166
optimizer = opt(model.parameters())
7267
```
7368

74-
If you want to build the optimizer with parameters & configs, there's
75-
<span class="title-ref">create_optimizer()</span> API.
69+
If you want to build the optimizer with parameters & configs, there's `create_optimizer()` API.
7670

77-
``` python
71+
```python
7872
from pytorch_optimizer import create_optimizer
7973

8074
optimizer = create_optimizer(
@@ -91,7 +85,7 @@ optimizer = create_optimizer(
9185

9286
You can check the supported optimizers with below code.
9387

94-
``` python
88+
```python
9589
from pytorch_optimizer import get_supported_optimizers
9690

9791
supported_optimizers = get_supported_optimizers()
@@ -167,7 +161,7 @@ supported_optimizers = get_supported_optimizers()
167161

168162
You can check the supported learning rate schedulers with below code.
169163

170-
``` python
164+
```python
171165
from pytorch_optimizer import get_supported_lr_schedulers
172166

173167
supported_lr_schedulers = get_supported_lr_schedulers()
@@ -182,7 +176,7 @@ supported_lr_schedulers = get_supported_lr_schedulers()
182176

183177
You can check the supported loss functions with below code.
184178

185-
``` python
179+
```python
186180
from pytorch_optimizer import get_supported_loss_functions
187181

188182
supported_loss_functions = get_supported_loss_functions()
@@ -201,8 +195,7 @@ supported_loss_functions = get_supported_loss_functions()
201195

202196
## Useful Resources
203197

204-
Several optimization ideas to regularize & stabilize the training. Most
205-
of the ideas are applied in `Ranger21` optimizer.
198+
Several optimization ideas to regularize & stabilize the training. Most of the ideas are applied in `Ranger21` optimizer.
206199

207200
Also, most of the captures are taken from `Ranger21` paper.
208201

@@ -214,131 +207,116 @@ Also, most of the captures are taken from `Ranger21` paper.
214207
| [Lookahead](#lookahead) | [Chebyshev learning rate schedule](#chebyshev-learning-rate-schedule) | [(Adaptive) Sharpness-Aware Minimization](#adaptive-sharpness-aware-minimization) |
215208
| [On the Convergence of Adam and Beyond](#on-the-convergence-of-adam-and-beyond) | [Improved bias-correction in Adam](#improved-bias-correction-in-adam) | [Adaptive Gradient Norm Correction](#adaptive-gradient-norm-correction) |
216209

217-
## Adaptive Gradient Clipping
210+
### Adaptive Gradient Clipping
218211

219-
This idea originally proposed in `NFNet (Normalized-Free Network)`
220-
paper.
221-
`AGC (Adaptive Gradient Clipping)` clips gradients based on the
222-
`unit-wise ratio of gradient norms to parameter norms`.
212+
This idea originally proposed in `NFNet (Normalized-Free Network)` paper. `AGC (Adaptive Gradient Clipping)` clips gradients based on the `unit-wise ratio of gradient norms to parameter norms`.
223213

224-
- code :
225-
[github](https://github.com/deepmind/deepmind-research/tree/master/nfnets)
226-
- paper : [arXiv](https://arxiv.org/abs/2102.06171)
214+
* code : [github](https://github.com/deepmind/deepmind-research/tree/master/nfnets)
215+
* paper : [arXiv](https://arxiv.org/abs/2102.06171)
227216

228-
## Gradient Centralization
217+
### Gradient Centralization
229218

230219
| |
231220
|---------------------------------------------------------------------------------------------------------------|
232221
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/gradient_centralization.png) |
233222

234-
`Gradient Centralization (GC)` operates directly on gradients by
235-
centralizing the gradient to have zero mean.
223+
`Gradient Centralization (GC)` operates directly on gradients by centralizing the gradient to have zero mean.
236224

237-
- code :
238-
[github](https://github.com/Yonghongwei/Gradient-Centralization)
239-
- paper : [arXiv](https://arxiv.org/abs/2004.01461)
225+
* code : [github](https://github.com/Yonghongwei/Gradient-Centralization)
226+
* paper : [arXiv](https://arxiv.org/abs/2004.01461)
240227

241-
## Softplus Transformation
228+
### Softplus Transformation
242229

243-
By running the final variance denom through the softplus function, it
244-
lifts extremely tiny values to keep them viable.
230+
By running the final variance denom through the softplus function, it lifts extremely tiny values to keep them viable.
245231

246-
- paper : [arXiv](https://arxiv.org/abs/1908.00700)
232+
* paper : [arXiv](https://arxiv.org/abs/1908.00700)
247233

248-
## Gradient Normalization
234+
### Gradient Normalization
249235

250-
## Norm Loss
236+
### Norm Loss
251237

252238
| |
253239
|-------------------------------------------------------------------------------------------------|
254240
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/norm_loss.png) |
255241

256-
- paper : [arXiv](https://arxiv.org/abs/2103.06583)
242+
* paper : [arXiv](https://arxiv.org/abs/2103.06583)
257243

258-
## Positive-Negative Momentum
244+
### Positive-Negative Momentum
259245

260246
| |
261247
|------------------------------------------------------------------------------------------------------------------|
262248
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/positive_negative_momentum.png) |
263249

264-
- code :
265-
[github](https://github.com/zeke-xie/Positive-Negative-Momentum)
266-
- paper : [arXiv](https://arxiv.org/abs/2103.17182)
250+
* code : [github](https://github.com/zeke-xie/Positive-Negative-Momentum)
251+
* paper : [arXiv](https://arxiv.org/abs/2103.17182)
267252

268-
## Linear learning rate warmup
253+
### Linear learning rate warmup
269254

270255
| |
271256
|--------------------------------------------------------------------------------------------------------|
272257
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/linear_lr_warmup.png) |
273258

274-
- paper : [arXiv](https://arxiv.org/abs/1910.04209)
259+
* paper : [arXiv](https://arxiv.org/abs/1910.04209)
275260

276-
## Stable weight decay
261+
### Stable weight decay
277262

278263
| |
279264
|-----------------------------------------------------------------------------------------------------------|
280265
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/stable_weight_decay.png) |
281266

282-
- code :
283-
[github](https://github.com/zeke-xie/stable-weight-decay-regularization)
284-
- paper : [arXiv](https://arxiv.org/abs/2011.11152)
267+
* code : [github](https://github.com/zeke-xie/stable-weight-decay-regularization)
268+
* paper : [arXiv](https://arxiv.org/abs/2011.11152)
285269

286-
## Explore-exploit learning rate schedule
270+
### Explore-exploit learning rate schedule
287271

288272
| |
289273
|-------------------------------------------------------------------------------------------------------------------|
290274
| ![image](https://raw.githubusercontent.com/kozistr/pytorch_optimizer/main/assets/explore_exploit_lr_schedule.png) |
291275

292-
- code :
293-
[github](https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis)
294-
- paper : [arXiv](https://arxiv.org/abs/2003.03977)
276+
* code : [github](https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis)
277+
* paper : [arXiv](https://arxiv.org/abs/2003.03977)
295278

296-
## Lookahead
279+
### Lookahead
297280

298-
`k` steps forward, 1 step back. `Lookahead` consisting of keeping an
299-
exponential moving average of the weights that is
300-
updated and substituted to the current weights every `k_{lookahead}`
301-
steps (5 by default).
281+
`k` steps forward, 1 step back. `Lookahead` consisting of keeping an exponential moving average of the weights that is updated and substituted to the current weights every `k` lookahead steps (5 by default).
302282

303-
## Chebyshev learning rate schedule
283+
### Chebyshev learning rate schedule
304284

305285
Acceleration via Fractal Learning Rate Schedules.
306286

307-
## (Adaptive) Sharpness-Aware Minimization
287+
### (Adaptive) Sharpness-Aware Minimization
288+
289+
Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
290+
In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
291+
292+
### On the Convergence of Adam and Beyond
308293

309-
Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value
310-
and loss sharpness.
311-
In particular, it seeks parameters that lie in neighborhoods having
312-
uniformly low loss.
294+
Convergence issues can be fixed by endowing such algorithms with 'long-term memory' of past gradients.
313295

314-
## On the Convergence of Adam and Beyond
296+
### Improved bias-correction in Adam
315297

316-
Convergence issues can be fixed by endowing such algorithms with
317-
'long-term memory' of past gradients.
298+
With the default bias-correction, Adam may actually make larger than requested gradient updates early in training.
318299

319-
## Improved bias-correction in Adam
300+
### Adaptive Gradient Norm Correction
320301

321-
With the default bias-correction, Adam may actually make larger than
322-
requested gradient updates early in training.
302+
Correcting the norm of a gradient in each iteration based on the adaptive training history of gradient norm.
323303

324-
## Adaptive Gradient Norm Correction
304+
## Frequently asked questions
325305

326-
Correcting the norm of a gradient in each iteration based on the
327-
adaptive training history of gradient norm.
306+
[here](./qa.md)
328307

329308
## Citation
330309

331-
Please cite the original authors of optimization algorithms. You can
332-
easily find it in the above table! If you use this software, please cite
333-
it below. Or you can get it from "cite this repository" button.
310+
Please cite the original authors of optimization algorithms. You can easily find it in the above table!
311+
If you use this software, please cite it below. Or you can get it from "cite this repository" button.
334312

335-
@software{Kim_pytorch_optimizer_optimizer_2022,
313+
@software{Kim_pytorch_optimizer_optimizer_2021,
336314
author = {Kim, Hyeongchan},
337315
month = jan,
338316
title = {{pytorch_optimizer: optimizer & lr scheduler & loss function collections in PyTorch}},
339317
url = {https://github.com/kozistr/pytorch_optimizer},
340-
version = {2.11.0},
341-
year = {2022}
318+
version = {2.12.0},
319+
year = {2021}
342320
}
343321

344322
## Maintainer

0 commit comments

Comments
 (0)