Merge branch 'master' into fix-ret-rules

SkafteNicki · web-flow · commit 32d6f81dd35c · 2025-09-13T12:33:43.000+02:00
diff --git a/docs/source-pytorch/common/trainer.rst b/docs/source-pytorch/common/trainer.rst
@@ -246,6 +246,27 @@ Example::
 See also: :ref:`gradient_accumulation` to enable more fine-grained accumulation schedules.
 
 
+barebones
+^^^^^^^^^
+
+Whether to run in "barebones mode", where all features that may impact raw speed are disabled. This is meant for
+analyzing the Trainer overhead and is discouraged during regular training runs.
+
+When enabled, the following features are automatically deactivated:
+- Checkpointing: ``enable_checkpointing=False``
+- Logging: ``logger=False``, ``log_every_n_steps=0``
+- Progress bar: ``enable_progress_bar=False``
+- Model summary: ``enable_model_summary=False``
+- Sanity checking: ``num_sanity_val_steps=0``
+
+.. testcode::
+
+    # default used by the Trainer
+    trainer = Trainer(barebones=False)
+
+    # enable barebones mode for speed analysis
+    trainer = Trainer(barebones=True)
+
 benchmark
 ^^^^^^^^^
 
@@ -364,6 +385,22 @@ will need to be set up to use remote filepaths.
     # default used by the Trainer
     trainer = Trainer(default_root_dir=os.getcwd())
 
+
+detect_anomaly
+^^^^^^^^^^^^^^
+
+Enable anomaly detection for the autograd engine. This will significantly slow down compute speed and is recommended
+only for model debugging.
+
+.. testcode::
+
+    # default used by the Trainer
+    trainer = Trainer(detect_anomaly=False)
+
+    # enable anomaly detection for debugging
+    trainer = Trainer(detect_anomaly=True)
+
+
 devices
 ^^^^^^^
 
@@ -548,6 +585,24 @@ impact to subsequent runs. These are the changes enabled:
 - If using the CLI, the configuration file is not saved.
 
 
+gradient_clip_algorithm
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The gradient clipping algorithm to use. Pass ``gradient_clip_algorithm="value"`` to clip by value, and
+``gradient_clip_algorithm="norm"`` to clip by norm. By default it will be set to ``"norm"``.
+
+.. testcode::
+
+    # default used by the Trainer (defaults to "norm" when gradient_clip_val is set)
+    trainer = Trainer(gradient_clip_algorithm=None)
+
+    # clip by value
+    trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="value")
+
+    # clip by norm
+    trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="norm")
+
+
 gradient_clip_val
 ^^^^^^^^^^^^^^^^^
 
@@ -624,6 +679,26 @@ Example::
     # run through only 10 batches of the training set each epoch
     trainer = Trainer(limit_train_batches=10)
 
+
+limit_predict_batches
+^^^^^^^^^^^^^^^^^^^^^
+
+How much of prediction dataset to check. Value is per device.
+
+.. testcode::
+
+    # default used by the Trainer
+    trainer = Trainer(limit_predict_batches=1.0)
+
+    # run through only 25% of the prediction set
+    trainer = Trainer(limit_predict_batches=0.25)
+
+    # run for only 10 batches
+    trainer = Trainer(limit_predict_batches=10)
+
+In the case of multiple prediction dataloaders, the limit applies to each dataloader individually.
+
+
 limit_test_batches
 ^^^^^^^^^^^^^^^^^^
 
@@ -801,6 +876,23 @@ For customizable options use the :class:`~lightning.pytorch.callbacks.timer.Time
 In case ``max_time`` is used together with ``min_steps`` or ``min_epochs``, the ``min_*`` requirement
 always has precedence.
 
+
+model_registry
+^^^^^^^^^^^^^^
+
+If specified will upload the model to lightning model registry under the provided name.
+
+.. testcode::
+
+    # default used by the Trainer
+    trainer = Trainer(model_registry=None)
+
+    # specify model name for model hub upload
+    trainer = Trainer(model_registry="my-model-name")
+
+See `Lightning model registry docs <https://lightning.ai/docs/overview/finetune-models/model-registry>`_ for more info.
+
+
 num_nodes
 ^^^^^^^^^
 
@@ -875,12 +967,25 @@ Useful for quickly debugging or trying to overfit on purpose.
 
     # debug using a single consistent train batch and a single consistent val batch
 
+plugins
+^^^^^^^
 
-:ref:`Plugins` allow you to connect arbitrary backends, precision libraries, clusters etc. For example:
-
+Plugins allow you to connect arbitrary backends, precision libraries, clusters etc. and modification of core lightning logic.
+Examples of plugin types:
 - :ref:`Checkpoint IO <checkpointing_expert>`
 - `TorchElastic <https://pytorch.org/elastic/0.2.2/index.html>`_
 - :ref:`Precision Plugins <precision_expert>`
+- :class:`~lightning.pytorch.plugins.environments.ClusterEnvironment`
+
+.. testcode::
+
+    # default used by the Trainer
+    trainer = Trainer(plugins=None)
+
+    # example using built in slurm plugin
+    from lightning.fabric.plugins.environments import SLURMEnvironment
+    trainer = Trainer(plugins=[SLURMEnvironment()])
+
 
 To define your own behavior, subclass the relevant class and pass it in. Here's an example linking up your own
 :class:`~lightning.pytorch.plugins.environments.ClusterEnvironment`.