@@ -16,7 +16,7 @@ pytorch-optimizer
1616
1717| **pytorch-optimizer** is optimizer & lr scheduler collections in PyTorch.
1818| I just re-implemented (speed & memory tweaks, plug-ins) the algorithm while based on the original paper. Also, It includes useful and practical optimization ideas.
19- | Currently, 47 optimizers, 6 lr schedulers are supported!
19+ | Currently, 48 optimizers, 6 lr schedulers are supported!
2020|
2121| Highly inspired by `pytorch-optimizer <https://github.com/jettify/pytorch-optimizer>`__.
2222
@@ -179,6 +179,10 @@ You can check the supported optimizers & lr schedulers.
179179+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
180180| SRMM | *Stochastic regularized majorization-minimization with weakly convex and multi-convex surrogates * | `github <https://github.com/HanbaekLyu/SRMM >`__ | `https://arxiv.org/abs/2201.01652 <https://arxiv.org/abs/2201.01652 >`__ |
181181+--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
182+ | AvaGrad | *Domain-independent Dominance of Adaptive Methods * | `github <https://github.com/lolemacs/avagrad >`__ | `https://arxiv.org/abs/1912.01823 <https://arxiv.org/abs/1912.01823 >`__ |
183+ +--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
184+ | PCGrad | *Gradient Surgery for Multi-Task Learning * | `github <https://github.com/tianheyu927/PCGrad >`__ | `https://arxiv.org/abs/2001.06782 <https://arxiv.org/abs/2001.06782 >`__ |
185+ +--------------+---------------------------------------------------------------------------------------------------+-----------------------------------------------------------------------------------+-----------------------------------------------------------------------------------------------+
182186
183187Useful Resources
184188----------------
@@ -197,7 +201,7 @@ Also, most of the captures are taken from ``Ranger21`` paper.
197201+------------------------------------------+---------------------------------------------+--------------------------------------------+
198202| `Lookahead `_ | `Chebyshev learning rate schedule `_ | `(Adaptive) Sharpness-Aware Minimization `_ |
199203+------------------------------------------+---------------------------------------------+--------------------------------------------+
200- | `On the Convergence of Adam and Beyond `_ | `Gradient Surgery for Multi-Task Learning `_ | |
204+ | `On the Convergence of Adam and Beyond `_ | `Improved bias-correction in Adam `_ | ` Adaptive Gradient Norm Correction `_ |
201205+------------------------------------------+---------------------------------------------+--------------------------------------------+
202206
203207Adaptive Gradient Clipping
@@ -291,7 +295,7 @@ Lookahead
291295Chebyshev learning rate schedule
292296--------------------------------
293297
294- Acceleration via Fractal Learning Rate Schedules
298+ Acceleration via Fractal Learning Rate Schedules.
295299
296300- paper : `arXiv <https://arxiv.org/abs/2103.01338v1 >`__
297301
@@ -310,10 +314,16 @@ On the Convergence of Adam and Beyond
310314
311315- paper : `paper <https://openreview.net/forum?id=ryQu7f-RZ >`__
312316
313- Gradient Surgery for Multi-Task Learning
314- ----------------------------------------
317+ Improved bias-correction in Adam
318+ --------------------------------
319+
320+ | With the default bias-correction, Adam may actually make larger than requested gradient updates early in training.
321+
322+ - paper : `arXiv <https://arxiv.org/abs/2110.10828 >`_
323+
324+ Adaptive Gradient Norm Correction
325+ ---------------------------------
315326
316- - paper : `paper <https://arxiv.org/abs/2001.06782 >`__
317327
318328Citations
319329---------
@@ -358,7 +368,7 @@ Citations
358368
359369`On the Convergence of Adam and Beyond <https://ui.adsabs.harvard.edu/abs/2019arXiv190409237R/exportcitation >`__
360370
361- `Gradient surgery for multi-task learning <https://ui.adsabs.harvard.edu/abs/2020arXiv200106782Y/exportcitation >`__
371+ `Gradient surgery for multi-task learning <https://github.com/tianheyu927/PCGrad#reference >`__
362372
363373`AdamD <https://ui.adsabs.harvard.edu/abs/2021arXiv211010828S/exportcitation >`__
364374
@@ -420,6 +430,8 @@ Citations
420430
421431`SRMM <https://ui.adsabs.harvard.edu/abs/2022arXiv220101652L/exportcitation >`__
422432
433+ `AvaGrad <https://ui.adsabs.harvard.edu/abs/2019arXiv191201823S/exportcitation >`__
434+
423435Citation
424436--------
425437
0 commit comments