docs: PCGrad

kozistr · kozistr · commit 7bcc87385a09 · 2021-10-06T22:05:10.000+09:00
diff --git a/README.rst b/README.rst
@@ -73,17 +73,17 @@ of the ideas are applied in ``Ranger21`` optimizer.
 
 Also, most of the captures are taken from ``Ranger21`` paper.
 
-+------------------------------------------+-------------------------------------+--------------------------------------------+
-| `Adaptive Gradient Clipping`_            | `Gradient Centralization`_          | `Softplus Transformation`_                 |
-+------------------------------------------+-------------------------------------+--------------------------------------------+
-| `Gradient Normalization`_                | `Norm Loss`_                        | `Positive-Negative Momentum`_              |
-+------------------------------------------+-------------------------------------+--------------------------------------------+
-| `Linear learning rate warmup`_           | `Stable weight decay`_              | `Explore-exploit learning rate schedule`_  |
-+------------------------------------------+-------------------------------------+--------------------------------------------+
-| `Lookahead`_                             | `Chebyshev learning rate schedule`_ | `(Adaptive) Sharpness-Aware Minimization`_ |
-+------------------------------------------+-------------------------------------+--------------------------------------------+
-| `On the Convergence of Adam and Beyond`_ |                                     |                                            |
-+------------------------------------------+-------------------------------------+--------------------------------------------+
++------------------------------------------+---------------------------------------------+--------------------------------------------+
+| `Adaptive Gradient Clipping`_            | `Gradient Centralization`_                  | `Softplus Transformation`_                 |
++------------------------------------------+---------------------------------------------+--------------------------------------------+
+| `Gradient Normalization`_                | `Norm Loss`_                                | `Positive-Negative Momentum`_              |
++------------------------------------------+---------------------------------------------+--------------------------------------------+
+| `Linear learning rate warmup`_           | `Stable weight decay`_                      | `Explore-exploit learning rate schedule`_  |
++------------------------------------------+---------------------------------------------+--------------------------------------------+
+| `Lookahead`_                             | `Chebyshev learning rate schedule`_         | `(Adaptive) Sharpness-Aware Minimization`_ |
++------------------------------------------+---------------------------------------------+--------------------------------------------+
+| `On the Convergence of Adam and Beyond`_ | `Gradient Surgery for Multi-Task Learning`_ |                                            |                                            |
++------------------------------------------+---------------------------------------------+--------------------------------------------+
 
 Adaptive Gradient Clipping
 --------------------------
@@ -195,6 +195,11 @@ On the Convergence of Adam and Beyond
 
 - paper : `paper <https://openreview.net/forum?id=ryQu7f-RZ>`__
 
+Gradient Surgery for Multi-Task Learning
+----------------------------------------
+
+- paper : `paper <https://arxiv.org/abs/2001.06782>`__
+
 Citations
 ---------
 
@@ -430,6 +435,17 @@ On the Convergence of Adam and Beyond
       year={2019}
     }
 
+Gradient Surgery for Multi-Task Learning
+
+::
+
+    @article{yu2020gradient,
+      title={Gradient surgery for multi-task learning},
+      author={Yu, Tianhe and Kumar, Saurabh and Gupta, Abhishek and Levine, Sergey and Hausman, Karol and Finn, Chelsea},
+      journal={arXiv preprint arXiv:2001.06782},
+      year={2020}
+    }
+
 Author
 ------