@@ -315,45 +315,32 @@ Lookahead
315315| ``k`` steps forward, 1 step back. ``Lookahead`` consisting of keeping an exponential moving average of the weights that is
316316| updated and substituted to the current weights every ``k_{lookahead}`` steps (5 by default).
317317
318- - code : `github <https://github.com/alphadl/lookahead.pytorch >`__
319- - paper : `arXiv <https://arxiv.org/abs/1907.08610v2 >`__
320-
321318Chebyshev learning rate schedule
322319--------------------------------
323320
324321Acceleration via Fractal Learning Rate Schedules.
325322
326- - paper : `arXiv <https://arxiv.org/abs/2103.01338v1 >`__
327-
328323(Adaptive) Sharpness-Aware Minimization
329324---------------------------------------
330325
331326| Sharpness-Aware Minimization (SAM) simultaneously minimizes loss value and loss sharpness.
332327| In particular, it seeks parameters that lie in neighborhoods having uniformly low loss.
333328
334- - SAM paper : `paper <https://arxiv.org/abs/2010.01412 >`__
335- - ASAM paper : `paper <https://arxiv.org/abs/2102.11600 >`__
336- - A/SAM code : `github <https://github.com/davda54/sam >`__
337-
338329On the Convergence of Adam and Beyond
339330-------------------------------------
340331
341- - paper : ` paper < https://openreview.net/forum?id=ryQu7f-RZ >`__
332+ | Convergence issues can be fixed by endowing such algorithms with `long-term memory' of past gradients
342333
343334Improved bias-correction in Adam
344335--------------------------------
345336
346337| With the default bias-correction, Adam may actually make larger than requested gradient updates early in training.
347338
348- - paper : `arXiv <https://arxiv.org/abs/2110.10828 >`_
349-
350339Adaptive Gradient Norm Correction
351340---------------------------------
352341
353342| Correcting the norm of gradient in each iteration based on the adaptive training history of gradient norm.
354343
355- - paper : `arXiv <https://arxiv.org/abs/2210.06364 >`__
356-
357344Citation
358345--------
359346
0 commit comments