Lightning-AI
diff --git a/‎.github/workflows/ci-tests-pytorch.yml‎
Lines changed: 1 addition & 1 deletion b/‎.github/workflows/ci-tests-pytorch.yml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source-pytorch/advanced/training_tricks.rst‎
Lines changed: 36 additions & 11 deletions b/‎docs/source-pytorch/advanced/training_tricks.rst‎
Lines changed: 36 additions & 11 deletions
diff --git a/‎docs/source-pytorch/api_references.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/source-pytorch/api_references.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source-pytorch/extensions/callbacks.rst‎
Lines changed: 1 addition & 0 deletions b/‎docs/source-pytorch/extensions/callbacks.rst‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/source-pytorch/glossary/index.rst‎
Lines changed: 8 additions & 8 deletions b/‎docs/source-pytorch/glossary/index.rst‎
Lines changed: 8 additions & 8 deletions
diff --git a/‎docs/source-pytorch/model/build_model_intermediate.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source-pytorch/model/build_model_intermediate.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/source-pytorch/starter/introduction.rst‎
Lines changed: 1 addition & 1 deletion b/‎docs/source-pytorch/starter/introduction.rst‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎requirements/pytorch/test.txt‎
Lines changed: 3 additions & 0 deletions b/‎requirements/pytorch/test.txt‎
Lines changed: 3 additions & 0 deletions
diff --git a/‎src/lightning/fabric/CHANGELOG.md‎
Lines changed: 14 additions & 4 deletions b/‎src/lightning/fabric/CHANGELOG.md‎
Lines changed: 14 additions & 4 deletions
diff --git a/‎src/lightning/fabric/utilities/distributed.py‎
Lines changed: 5 additions & 1 deletion b/‎src/lightning/fabric/utilities/distributed.py‎
Lines changed: 5 additions & 1 deletion
@@ -139,7 +139,7 @@ jobs:
           pip install ".[${EXTRA_PREFIX}extra,${EXTRA_PREFIX}test,${EXTRA_PREFIX}strategies]" \
             -U --upgrade-strategy=eager --prefer-binary \
             -r requirements/_integrations/accelerators.txt \
-            --extra-index-url="${TORCH_URL}" --find-links="${PYPI_CACHE_DIR}"
+            --extra-index-url="${TORCH_URL}" --find-links="${PYPI_CACHE_DIR}" --find-links="https://download.pytorch.org/whl/torch-tensorrt"
           pip list
       - name: Drop LAI from extensions
         if: ${{ matrix.pkg-name != 'lightning' }}
 
@@ -50,23 +50,48 @@ Read more about :ref:`Configuring Gradient Clipping <configure_gradient_clipping
 
 ----------
 
-***************************
-Stochastic Weight Averaging
-***************************
+****************
+Weight Averaging
+****************
 
-Stochastic Weight Averaging (SWA) can make your models generalize better at virtually no additional cost.
-This can be used with both non-trained and trained models. The SWA procedure smooths the loss landscape thus making
-it harder to end up in a local minimum during optimization.
+Weight averaging methods such as Stochastic Weight Averaging (SWA) and Exponential Moving Average (EMA) can make your
+models generalize better at virtually no additional cost. Averaging smooths the loss landscape thus making it harder to
+end up in a local minimum during optimization.
 
-For a more detailed explanation of SWA and how it works,
-read `this post <https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging>`__ by the PyTorch team.
+Lightning provides two callbacks to facilitate weight averaging. :class:`~lightning.pytorch.callbacks.WeightAveraging`
+is a generic callback that wraps the
+`AveragedModel <https://pytorch.org/docs/stable/generated/torch.optim.swa_utils.AveragedModel.html>`__ class from
+PyTorch. It allows SWA, EMA, or a custom averaging strategy to be used. By default, it updates the weights after every
+step, but it can be customized to update at specific steps or epochs by overriding the `should_update()` method.
 
-.. seealso:: The :class:`~lightning.pytorch.callbacks.StochasticWeightAveraging` callback
+The older :class:`~lightning.pytorch.callbacks.StochasticWeightAveraging` callback is specific to SWA. It starts the SWA
+procedure after a certain number of epochs and always runs on every epoch. Additionally, it switches to a constant
+learning rate schedule (`SWALR <https://pytorch.org/docs/stable/generated/torch.optim.swa_utils.SWALR.html>`__) when the
+procedure starts.
+
+.. seealso::
+    For a more detailed explanation of SWA and how it works, read
+    `this post <https://pytorch.org/blog/pytorch-1.6-now-includes-stochastic-weight-averaging>`__ by the PyTorch team.
+
+.. seealso::
+    The :class:`~lightning.pytorch.callbacks.WeightAveraging` callback and
+    :class:`~lightning.pytorch.callbacks.StochasticWeightAveraging` callback
 
 .. testcode::
 
-    # Enable Stochastic Weight Averaging using the callback
-    trainer = Trainer(callbacks=[StochasticWeightAveraging(swa_lrs=1e-2)])
+    from lightning.pytorch.callbacks import StochasticWeightAveraging, WeightAveraging
+    from torch.optim.swa_utils import get_ema_avg_fn
+
+    # Enable Exponential Moving Average after 100 steps
+    class EMAWeightAveraging(WeightAveraging):
+        def __init__(self):
+            super().__init__(avg_fn=get_ema_avg_fn())
+        def should_update(self, step_idx=None, epoch_idx=None):
+            return (step_idx is not None) and (step_idx >= 100)
+    trainer = Trainer(callbacks=EMAWeightAveraging())
+
+    # Enable Stochastic Weight Averaging after 10 epochs with learning rate 0.01
+    trainer = Trainer(callbacks=StochasticWeightAveraging(swa_epoch_start=10, swa_lrs=0.01))
 
 ----------
 
 
@@ -48,6 +48,7 @@ callbacks
     ThroughputMonitor
     Timer
     TQDMProgressBar
+    WeightAveraging
 
 cli
 -----
 
@@ -83,6 +83,7 @@ Lightning has a few built-in callbacks.
     StochasticWeightAveraging
     Timer
     TQDMProgressBar
+    WeightAveraging
 
 ----------
 
 
@@ -42,13 +42,13 @@
    Strategy registry <../advanced/strategy_registry>
    Strategy integrations <../integrations/strategies/index>
    Style guide <../starter/style_guide>
-   SWA <../advanced/training_tricks>
    SLURM <../clouds/cluster_advanced>
    Tensor Parallel <../advanced/model_parallel/tp>
    Transfer learning <../advanced/transfer_learning>
    Trainer <../common/trainer>
    TorchRun (TorchElastic) <../clouds/cluster_intermediate_2>
    Warnings <../advanced/warnings>
+   Weight averaging <../advanced/training_tricks>
 
 
 ########
@@ -326,13 +326,6 @@ Glossary
    :button_link: ../starter/style_guide.html
    :height: 100
 
-.. displayitem::
-   :header: SWA
-   :description: Stochastic Weight Averaging (SWA) can make your models generalize better
-   :col_css: col-md-12
-   :button_link: ../advanced/training_tricks.html#stochastic-weight-averaging
-   :height: 100
-
 .. displayitem::
    :header: SLURM
    :description: Simple Linux Utility for Resource Management, or simply Slurm, is a free and open-source job scheduler for Linux clusters
@@ -375,6 +368,13 @@ Glossary
    :button_link: ../advanced/warnings.html
    :height: 100
 
+.. displayitem::
+   :header: Weight averaging
+   :description: Stochastic Weight Averaging (SWA) or Exponential Moving Average (EMA) can make your models generalize better
+   :col_css: col-md-12
+   :button_link: ../advanced/training_tricks.html#weight-averaging
+   :height: 100
+
 .. raw:: html
 
         </div>
 
@@ -27,7 +27,7 @@ Enable advanced training features using Trainer arguments. These are SOTA techni
     )
 
    # access the latest state of the art techniques
-   trainer = Trainer(callbacks=[StochasticWeightAveraging(...)])
+   trainer = Trainer(callbacks=[WeightAveraging(...)])
 
 ----
 
 
@@ -252,7 +252,7 @@ Enable advanced training features using Trainer arguments. These are state-of-th
     )
 
    # access the latest state of the art techniques
-   trainer = L.Trainer(callbacks=[StochasticWeightAveraging(...)])
+   trainer = L.Trainer(callbacks=[WeightAveraging(...)])
 
 ----
 
 
@@ -19,3 +19,6 @@ uvicorn  # for `ServableModuleValidator`  # not setting version as re-defined in
 
 tensorboard >=2.9.1, <2.21.0  # for `TensorBoardLogger`
 mlflow >=3.0.0, <4.0   # for `MLFlowLogger
+
+--find-links https://download.pytorch.org/whl/torch-tensorrt
+torch-tensorrt; platform_system == "Linux" and python_version >= "3.12"
@@ -19,18 +19,28 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/).
 
 ### Changed
 
-- Raise ValueError when seed is `out-of-bounds` or `cannot be cast to int` ([#21029](https://github.com/Lightning-AI/pytorch-lightning/pull/21029))
+-
 
 
 ### Fixed
 
-- Fix XLA strategy to add support for `global_ordinal`, `local_ordinal`, `world_size` which came instead of deprecated methods ([#20852](https://github.com/Lightning-AI/pytorch-lightning/issues/20852))
+-
 
 
-- fix: remove extra `name` parameter in accelerator registry decorator ([#20975](https://github.com/Lightning-AI/pytorch-lightning/pull/20975))
+---
 
+## [2.5.3] - 2025-08-13
+
+### Changed
+
+- Enable "auto" for `devices` and `accelerator` as CLI arguments ([#20913](https://github.com/Lightning-AI/pytorch-lightning/pull/20913))
+- Raise ValueError when seed is `out-of-bounds` or `cannot be cast to int` ([#21029](https://github.com/Lightning-AI/pytorch-lightning/pull/21029))
+
+### Fixed
+
+- Fixed XLA strategy to add support for `global_ordinal`, `local_ordinal`, `world_size` which came instead of deprecated methods ([#20852](https://github.com/Lightning-AI/pytorch-lightning/issues/20852))
+- Fixed remove extra `name` parameter in accelerator registry decorator ([#20975](https://github.com/Lightning-AI/pytorch-lightning/pull/20975))
 
----
 
 ## [2.5.2] - 2025-3-20
 
 
@@ -319,7 +319,11 @@ def _destroy_dist_connection() -> None:
 
 
 def _get_default_process_group_backend_for_device(device: torch.device) -> str:
-    return "nccl" if device.type == "cuda" else "gloo"
+    """Return corresponding distributed backend for a given device."""
+    device_backend_map = torch.distributed.Backend.default_device_backend_map
+    if device.type in device_backend_map:
+        return device_backend_map[device.type]
+    return "gloo"
 
 
 class _DatasetSamplerWrapper(Dataset):
Original file line number	Diff line number	Diff line change
`@@ -27,7 +27,7 @@ Enable advanced training features using Trainer arguments. These are SOTA techni`
`27`	`27`	`)`
`28`	`28`
`29`	`29`	`# access the latest state of the art techniques`
`30`		`- trainer = Trainer(callbacks=[StochasticWeightAveraging(...)])`
	`30`	`+ trainer = Trainer(callbacks=[WeightAveraging(...)])`
`31`	`31`
`32`	`32`	`----`
`33`	`33`
Original file line number	Diff line number	Diff line change
`@@ -252,7 +252,7 @@ Enable advanced training features using Trainer arguments. These are state-of-th`
`252`	`252`	`)`
`253`	`253`
`254`	`254`	`# access the latest state of the art techniques`
`255`		`- trainer = L.Trainer(callbacks=[StochasticWeightAveraging(...)])`
	`255`	`+ trainer = L.Trainer(callbacks=[WeightAveraging(...)])`
`256`	`256`
`257`	`257`	`----`
`258`	`258`