@@ -53,7 +53,7 @@ Also, most of the captures are taken from `Ranger21` paper.
5353This idea originally proposed in ` NFNet (Normalized-Free Network) ` paper.
5454AGC (Adaptive Gradient Clipping) clips gradients based on the ` unit-wise ratio of gradient norms to parameter norms ` .
5555
56- * github : [ code ] ( https://github.com/deepmind/deepmind-research/tree/master/nfnets )
56+ * code : [ github ] ( https://github.com/deepmind/deepmind-research/tree/master/nfnets )
5757* paper : [ arXiv] ( https://arxiv.org/abs/2102.06171 )
5858
5959### Gradient Centralization (GC)
@@ -62,7 +62,7 @@ AGC (Adaptive Gradient Clipping) clips gradients based on the `unit-wise ratio o
6262
6363Gradient Centralization (GC) operates directly on gradients by centralizing the gradient to have zero mean.
6464
65- * github : [ code ] ( https://github.com/Yonghongwei/Gradient-Centralization )
65+ * code : [ github ] ( https://github.com/Yonghongwei/Gradient-Centralization )
6666* paper : [ arXiv] ( https://arxiv.org/abs/2004.01461 )
6767
6868### Softplus Transformation
@@ -83,7 +83,7 @@ By running the final variance denom through the softplus function, it lifts extr
8383
8484![ positive_negative_momentum] ( assets/positive_negative_momentum.png )
8585
86- * github : [ code ] ( https://github.com/zeke-xie/Positive-Negative-Momentum )
86+ * code : [ github ] ( https://github.com/zeke-xie/Positive-Negative-Momentum )
8787* paper : [ arXiv] ( https://arxiv.org/abs/2103.17182 )
8888
8989### Linear learning-rate warm-up
@@ -96,22 +96,22 @@ By running the final variance denom through the softplus function, it lifts extr
9696
9797![ stable_weight_decay] ( assets/stable_weight_decay.png )
9898
99- * github : [ code ] ( https://github.com/zeke-xie/stable-weight-decay-regularization )
99+ * code : [ github ] ( https://github.com/zeke-xie/stable-weight-decay-regularization )
100100* paper : [ arXiv] ( https://arxiv.org/abs/2011.11152 )
101101
102102### Explore-exploit learning-rate schedule
103103
104104![ explore_exploit_lr_schedule] ( assets/explore_exploit_lr_schedule.png )
105105
106- * github : [ code ] ( https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis )
106+ * code : [ github ] ( https://github.com/nikhil-iyer-97/wide-minima-density-hypothesis )
107107* paper : [ arXiv] ( https://arxiv.org/abs/2003.03977 )
108108
109109### Lookahead
110110
111111` k ` steps forward, 1 step back. ` Lookahead ` consisting of keeping an exponential moving average of the weights that is
112112updated and substituted to the current weights every ` k_{lookahead} ` steps (5 by default).
113113
114- * github : [ code ] ( https://github.com/alphadl/lookahead.pytorch )
114+ * code : [ github ] ( https://github.com/alphadl/lookahead.pytorch )
115115* paper : [ arXiv] ( https://arxiv.org/abs/1907.08610v2 )
116116
117117### Chebyshev learning rate schedule
@@ -120,6 +120,15 @@ Acceleration via Fractal Learning Rate Schedules
120120
121121* paper : [ arXiv] ( https://arxiv.org/abs/2103.01338v1 )
122122
123+ ### (Adaptive) Sharpness-Aware Minimization (A/SAM)
124+
125+ Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
126+ In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
127+
128+ * SAM paper : [ paper] ( https://arxiv.org/abs/2010.01412 )
129+ * ASAM paper : [ paper] ( https://arxiv.org/abs/2102.11600 )
130+ * A/SAM code : [ github] ( https://github.com/davda54/sam )
131+
123132## Citations
124133
125134<details >
@@ -370,6 +379,36 @@ Acceleration via Fractal Learning Rate Schedules
370379
371380</details >
372381
382+ <details >
383+
384+ <summary >Sharpness-Aware Minimization</summary >
385+
386+ ```
387+ @article{foret2020sharpness,
388+ title={Sharpness-aware minimization for efficiently improving generalization},
389+ author={Foret, Pierre and Kleiner, Ariel and Mobahi, Hossein and Neyshabur, Behnam},
390+ journal={arXiv preprint arXiv:2010.01412},
391+ year={2020}
392+ }
393+ ```
394+
395+ </details >
396+
397+ <details >
398+
399+ <summary >Adaptive Sharpness-Aware Minimization</summary >
400+
401+ ```
402+ @article{kwon2021asam,
403+ title={ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks},
404+ author={Kwon, Jungmin and Kim, Jeongseop and Park, Hyunseo and Choi, In Kwon},
405+ journal={arXiv preprint arXiv:2102.11600},
406+ year={2021}
407+ }
408+ ```
409+
410+ </details >
411+
373412## Author
374413
375414Hyeongchan Kim / [ @kozistr ] ( http://kozistr.tech/about )
0 commit comments