diff --git a/docs/source-pytorch/common/trainer.rst b/docs/source-pytorch/common/trainer.rst index d63bdeee1f5cd..369a33514c7e5 100644 --- a/docs/source-pytorch/common/trainer.rst +++ b/docs/source-pytorch/common/trainer.rst @@ -246,6 +246,27 @@ Example:: See also: :ref:`gradient_accumulation` to enable more fine-grained accumulation schedules. +barebones +^^^^^^^^^ + +Whether to run in "barebones mode", where all features that may impact raw speed are disabled. This is meant for +analyzing the Trainer overhead and is discouraged during regular training runs. + +When enabled, the following features are automatically deactivated: +- Checkpointing: ``enable_checkpointing=False`` +- Logging: ``logger=False``, ``log_every_n_steps=0`` +- Progress bar: ``enable_progress_bar=False`` +- Model summary: ``enable_model_summary=False`` +- Sanity checking: ``num_sanity_val_steps=0`` + +.. testcode:: + + # default used by the Trainer + trainer = Trainer(barebones=False) + + # enable barebones mode for speed analysis + trainer = Trainer(barebones=True) + benchmark ^^^^^^^^^ @@ -364,6 +385,22 @@ will need to be set up to use remote filepaths. # default used by the Trainer trainer = Trainer(default_root_dir=os.getcwd()) + +detect_anomaly +^^^^^^^^^^^^^^ + +Enable anomaly detection for the autograd engine. This will significantly slow down compute speed and is recommended +only for model debugging. + +.. testcode:: + + # default used by the Trainer + trainer = Trainer(detect_anomaly=False) + + # enable anomaly detection for debugging + trainer = Trainer(detect_anomaly=True) + + devices ^^^^^^^ @@ -548,6 +585,24 @@ impact to subsequent runs. These are the changes enabled: - If using the CLI, the configuration file is not saved. +gradient_clip_algorithm +^^^^^^^^^^^^^^^^^^^^^^^ + +The gradient clipping algorithm to use. Pass ``gradient_clip_algorithm="value"`` to clip by value, and +``gradient_clip_algorithm="norm"`` to clip by norm. By default it will be set to ``"norm"``. + +.. testcode:: + + # default used by the Trainer (defaults to "norm" when gradient_clip_val is set) + trainer = Trainer(gradient_clip_algorithm=None) + + # clip by value + trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="value") + + # clip by norm + trainer = Trainer(gradient_clip_val=0.5, gradient_clip_algorithm="norm") + + gradient_clip_val ^^^^^^^^^^^^^^^^^ @@ -624,6 +679,26 @@ Example:: # run through only 10 batches of the training set each epoch trainer = Trainer(limit_train_batches=10) + +limit_predict_batches +^^^^^^^^^^^^^^^^^^^^^ + +How much of prediction dataset to check. Value is per device. + +.. testcode:: + + # default used by the Trainer + trainer = Trainer(limit_predict_batches=1.0) + + # run through only 25% of the prediction set + trainer = Trainer(limit_predict_batches=0.25) + + # run for only 10 batches + trainer = Trainer(limit_predict_batches=10) + +In the case of multiple prediction dataloaders, the limit applies to each dataloader individually. + + limit_test_batches ^^^^^^^^^^^^^^^^^^ @@ -801,6 +876,23 @@ For customizable options use the :class:`~lightning.pytorch.callbacks.timer.Time In case ``max_time`` is used together with ``min_steps`` or ``min_epochs``, the ``min_*`` requirement always has precedence. + +model_registry +^^^^^^^^^^^^^^ + +If specified will upload the model to lightning model registry under the provided name. + +.. testcode:: + + # default used by the Trainer + trainer = Trainer(model_registry=None) + + # specify model name for model hub upload + trainer = Trainer(model_registry="my-model-name") + +See `Lightning model registry docs `_ for more info. + + num_nodes ^^^^^^^^^ @@ -875,12 +967,25 @@ Useful for quickly debugging or trying to overfit on purpose. # debug using a single consistent train batch and a single consistent val batch +plugins +^^^^^^^ -:ref:`Plugins` allow you to connect arbitrary backends, precision libraries, clusters etc. For example: - +Plugins allow you to connect arbitrary backends, precision libraries, clusters etc. and modification of core lightning logic. +Examples of plugin types: - :ref:`Checkpoint IO ` - `TorchElastic `_ - :ref:`Precision Plugins ` +- :class:`~lightning.pytorch.plugins.environments.ClusterEnvironment` + +.. testcode:: + + # default used by the Trainer + trainer = Trainer(plugins=None) + + # example using built in slurm plugin + from lightning.fabric.plugins.environments import SLURMEnvironment + trainer = Trainer(plugins=[SLURMEnvironment()]) + To define your own behavior, subclass the relevant class and pass it in. Here's an example linking up your own :class:`~lightning.pytorch.plugins.environments.ClusterEnvironment`.