@@ -73,17 +73,17 @@ of the ideas are applied in ``Ranger21`` optimizer.
7373
7474Also, most of the captures are taken from ``Ranger21 `` paper.
7575
76- +------------------------------------------+-------------------------------------+--------------------------------------------+
77- | `Adaptive Gradient Clipping `_ | `Gradient Centralization `_ | `Softplus Transformation `_ |
78- +------------------------------------------+-------------------------------------+--------------------------------------------+
79- | `Gradient Normalization `_ | `Norm Loss `_ | `Positive-Negative Momentum `_ |
80- +------------------------------------------+-------------------------------------+--------------------------------------------+
81- | `Linear learning rate warmup `_ | `Stable weight decay `_ | `Explore-exploit learning rate schedule `_ |
82- +------------------------------------------+-------------------------------------+--------------------------------------------+
83- | `Lookahead `_ | `Chebyshev learning rate schedule `_ | `(Adaptive) Sharpness-Aware Minimization `_ |
84- +------------------------------------------+-------------------------------------+--------------------------------------------+
85- | `On the Convergence of Adam and Beyond `_ | | |
86- +------------------------------------------+-------------------------------------+--------------------------------------------+
76+ +------------------------------------------+--------------------------------------------- +--------------------------------------------+
77+ | `Adaptive Gradient Clipping `_ | `Gradient Centralization `_ | `Softplus Transformation `_ |
78+ +------------------------------------------+--------------------------------------------- +--------------------------------------------+
79+ | `Gradient Normalization `_ | `Norm Loss `_ | `Positive-Negative Momentum `_ |
80+ +------------------------------------------+--------------------------------------------- +--------------------------------------------+
81+ | `Linear learning rate warmup `_ | `Stable weight decay `_ | `Explore-exploit learning rate schedule `_ |
82+ +------------------------------------------+--------------------------------------------- +--------------------------------------------+
83+ | `Lookahead `_ | `Chebyshev learning rate schedule `_ | `(Adaptive) Sharpness-Aware Minimization `_ |
84+ +------------------------------------------+--------------------------------------------- +--------------------------------------------+
85+ | `On the Convergence of Adam and Beyond `_ | ` Gradient Surgery for Multi-Task Learning `_ | | |
86+ +------------------------------------------+--------------------------------------------- +--------------------------------------------+
8787
8888Adaptive Gradient Clipping
8989--------------------------
@@ -195,6 +195,11 @@ On the Convergence of Adam and Beyond
195195
196196- paper : `paper <https://openreview.net/forum?id=ryQu7f-RZ >`__
197197
198+ Gradient Surgery for Multi-Task Learning
199+ ----------------------------------------
200+
201+ - paper : `paper <https://arxiv.org/abs/2001.06782 >`__
202+
198203Citations
199204---------
200205
@@ -430,6 +435,17 @@ On the Convergence of Adam and Beyond
430435 year={2019}
431436 }
432437
438+ Gradient Surgery for Multi-Task Learning
439+
440+ ::
441+
442+ @article{yu2020gradient,
443+ title={Gradient surgery for multi-task learning},
444+ author={Yu, Tianhe and Kumar, Saurabh and Gupta, Abhishek and Levine, Sergey and Hausman, Karol and Finn, Chelsea},
445+ journal={arXiv preprint arXiv:2001.06782},
446+ year={2020}
447+ }
448+
433449Author
434450------
435451
0 commit comments