Training tricks mentions WeightAveraging and EMA

Seppo Enarvi · Seppo Enarvi · commit aec9f6e3f163 · 2025-03-22T09:52:15.000+02:00
diff --git a/docs/source-pytorch/advanced/training_tricks.rst b/docs/source-pytorch/advanced/training_tricks.rst
@@ -50,23 +50,44 @@ Read more about :ref:`Configuring Gradient Clipping <configure_gradient_clipping
 
 ----------
 
-***************************
-Stochastic Weight Averaging
-***************************
+****************
+Weight Averaging
+****************
 
-Stochastic Weight Averaging (SWA) can make your models generalize better at virtually no additional cost.
-This can be used with both non-trained and trained models. The SWA procedure smooths the loss landscape thus making
-it harder to end up in a local minimum during optimization.
+Weight averaging methods such as Stochastic Weight Averaging (SWA) and Exponential Moving Average (EMA) can make your
+models generalize better at virtually no additional cost. Averaging smooths the loss landscape thus making it harder to
+end up in a local minimum during optimization.
 
-For a more detailed explanation of SWA and how it works,
-read `this post <https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging>`__ by the PyTorch team.
+Lightning provides two callbacks to facilitate weight averaging. :class:`~lightning.pytorch.callbacks.WeightAveraging`
+is a generic callback that wraps the
+`AveragedModel <https://pytorch.org/docs/stable/generated/torch.optim.swa_utils.AveragedModel.html>`__ class from
+PyTorch. It allows SWA, EMA, or a custom averaging strategy to be used and it can be customized to run at specific steps
+or epochs.
 
-.. seealso:: The :class:`~lightning.pytorch.callbacks.StochasticWeightAveraging` callback
+The older :class:`~lightning.pytorch.callbacks.StochasticWeightAveraging` callback is specific to SWA. It starts the SWA
+procedure after a certain number of epochs and always runs on every epoch. Additionally, it switches to a constant
+learning rate schedule (`SWALR <https://pytorch.org/docs/stable/generated/torch.optim.swa_utils.SWALR.html>`__) when the
+procedure starts.
+
+.. seealso::
+    For a more detailed explanation of SWA and how it works, read
+    `this post <https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging>`__ by the PyTorch team.
 
 .. testcode::
 
-    # Enable Stochastic Weight Averaging using the callback
-    trainer = Trainer(callbacks=[StochasticWeightAveraging(swa_lrs=1e-2)])
+    from lightning.pytorch.callbacks import StochasticWeightAveraging, WeightAveraging
+    from torch.optim.swa_utils import get_ema_avg_fn
+
+    # Enable Exponential Moving Average after 100 steps
+    class EMAWeightAveraging(WeightAveraging):
+        def __init__(self):
+            super().__init__(avg_fn=get_ema_avg_fn())
+        def should_update(self, step_idx=None, epoch_idx=None):
+            return (step_idx is not None) and (step_idx >= 100)
+    trainer = Trainer(callbacks=EMAWeightAveraging())
+
+    # Enable Stochastic Weight Averaging after 10 epochs with learning rate 0.01
+    trainer = Trainer(callbacks=StochasticWeightAveraging(swa_epoch_start=10, swa_lrs=0.01))
 
 ----------
 
diff --git a/docs/source-pytorch/model/build_model_intermediate.rst b/docs/source-pytorch/model/build_model_intermediate.rst
@@ -27,7 +27,7 @@ Enable advanced training features using Trainer arguments. These are SOTA techni
     )
 
    # access the latest state of the art techniques
-   trainer = Trainer(callbacks=[StochasticWeightAveraging(...)])
+   trainer = Trainer(callbacks=[WeightAveraging(...)])
 
 ----
 
diff --git a/docs/source-pytorch/starter/introduction.rst b/docs/source-pytorch/starter/introduction.rst
@@ -252,7 +252,7 @@ Enable advanced training features using Trainer arguments. These are state-of-th
     )
 
    # access the latest state of the art techniques
-   trainer = L.Trainer(callbacks=[StochasticWeightAveraging(...)])
+   trainer = L.Trainer(callbacks=[WeightAveraging(...)])
 
 ----
 

Original file line number	Diff line number	Diff line change
`@@ -27,7 +27,7 @@ Enable advanced training features using Trainer arguments. These are SOTA techni`
`27`	`27`	`)`
`28`	`28`
`29`	`29`	`# access the latest state of the art techniques`
`30`		`- trainer = Trainer(callbacks=[StochasticWeightAveraging(...)])`
	`30`	`+ trainer = Trainer(callbacks=[WeightAveraging(...)])`
`31`	`31`
`32`	`32`	`----`
`33`	`33`
Original file line number	Diff line number	Diff line change
`@@ -252,7 +252,7 @@ Enable advanced training features using Trainer arguments. These are state-of-th`
`252`	`252`	`)`
`253`	`253`
`254`	`254`	`# access the latest state of the art techniques`
`255`		`- trainer = L.Trainer(callbacks=[StochasticWeightAveraging(...)])`
	`255`	`+ trainer = L.Trainer(callbacks=[WeightAveraging(...)])`
`256`	`256`
`257`	`257`	`----`
`258`	`258`