diff --git a/docs/source-fabric/advanced/model_parallel/tp_fsdp.rst b/docs/source-fabric/advanced/model_parallel/tp_fsdp.rst index f87645e0c11c6..454ebdacbb9d9 100644 --- a/docs/source-fabric/advanced/model_parallel/tp_fsdp.rst +++ b/docs/source-fabric/advanced/model_parallel/tp_fsdp.rst @@ -276,7 +276,7 @@ Next steps .. displayitem:: :header: Pipeline Parallelism - :description: Coming sooon + :description: Coming soon :col_css: col-md-4 :height: 160 :tag: advanced diff --git a/docs/source-pytorch/advanced/compile.rst b/docs/source-pytorch/advanced/compile.rst index 16fe91ca282df..90a5a1f508189 100644 --- a/docs/source-pytorch/advanced/compile.rst +++ b/docs/source-pytorch/advanced/compile.rst @@ -262,7 +262,7 @@ Avoid graph breaks When ``torch.compile`` looks at the code in your model's ``forward()`` or ``*_step()`` method, it will try to compile as much of the code as possible. If there are regions in the code that it doesn't understand, it will introduce a so-called "graph break" that essentially splits the code in optimized and unoptimized parts. Graph breaks aren't a deal breaker, since the optimized parts should still run faster. -But if you want to get the most out of ``torch.compile``, you might want to invest rewriting the problematic section of the code that produce the breaks. +But if you want to get the most out of ``torch.compile``, you might want to invest rewriting the problematic section of the code that produces the breaks. You can check whether your model produces graph breaks by calling ``torch.compile`` with ``fullgraph=True``: @@ -332,7 +332,7 @@ Enabling CUDA Graphs often results in a significant speedup, but sometimes also **Shape padding:** The specific shape/size of the tensors involved in the computation of your model (input, activations, weights, gradients, etc.) can have an impact on the performance. With shape padding enabled, ``torch.compile`` can extend the tensors by padding to a size that gives a better memory alignment. -Naturally, the tradoff here is that it will consume a bit more memory. +Naturally, the tradeoff here is that it will consume a bit more memory. .. code-block:: python diff --git a/docs/source-pytorch/advanced/model_parallel/tp_fsdp.rst b/docs/source-pytorch/advanced/model_parallel/tp_fsdp.rst index dae23bd4ee0c0..550a0a0fb26ae 100644 --- a/docs/source-pytorch/advanced/model_parallel/tp_fsdp.rst +++ b/docs/source-pytorch/advanced/model_parallel/tp_fsdp.rst @@ -282,7 +282,7 @@ Next steps .. displayitem:: :header: Pipeline Parallelism - :description: Coming sooon + :description: Coming soon :col_css: col-md-4 :height: 160 :tag: advanced diff --git a/docs/source-pytorch/advanced/post_training_quantization.rst b/docs/source-pytorch/advanced/post_training_quantization.rst index f925c6ccd47b4..60755593f015e 100644 --- a/docs/source-pytorch/advanced/post_training_quantization.rst +++ b/docs/source-pytorch/advanced/post_training_quantization.rst @@ -106,7 +106,7 @@ The "approach" parameter in PostTrainingQuantConfig is defined by the user to ma Quantize the model ================== -The model can be qutized by IntelĀ® Neural Compressor with: +The model can be quantized by IntelĀ® Neural Compressor with: .. code-block:: python @@ -126,7 +126,7 @@ At last, the quantized model can be saved by: Hands-on Examples ***************** -Based on the `given example code `_, we show how Intel Neural Compressor conduct model quantization on PyTorch Lightning. We first define the basic config of the quantization process. +Based on the `given example code `_, we show how Intel Neural Compressor conducts model quantization on PyTorch Lightning. We first define the basic config of the quantization process. .. code-block:: python diff --git a/docs/source-pytorch/advanced/pruning_quantization.rst b/docs/source-pytorch/advanced/pruning_quantization.rst index f8b099652a381..5c703de20fe3e 100644 --- a/docs/source-pytorch/advanced/pruning_quantization.rst +++ b/docs/source-pytorch/advanced/pruning_quantization.rst @@ -32,7 +32,7 @@ You can also perform iterative pruning, apply the `lottery ticket hypothesis ` for advanced use-cases. +Read more about :ref:`Configuring Gradient Clipping ` for advanced use cases. ---------- diff --git a/src/lightning/fabric/connector.py b/src/lightning/fabric/connector.py index 85d30a07ce207..0e0e86ee7c63e 100644 --- a/src/lightning/fabric/connector.py +++ b/src/lightning/fabric/connector.py @@ -239,7 +239,7 @@ def _check_config_and_set_final_flags( else: raise TypeError( f"Found invalid type for plugin {plugin}. Expected one of: Precision, " - "CheckpointIO, ClusterEnviroment." + "CheckpointIO, ClusterEnvironment." ) duplicated_plugin_key = [k for k, v in plugins_flags_types.items() if v > 1] diff --git a/src/lightning/fabric/plugins/precision/bitsandbytes.py b/src/lightning/fabric/plugins/precision/bitsandbytes.py index b78157d1c4074..646df2028672e 100644 --- a/src/lightning/fabric/plugins/precision/bitsandbytes.py +++ b/src/lightning/fabric/plugins/precision/bitsandbytes.py @@ -403,7 +403,7 @@ class _NF4DQLinear(_Linear4bit): def __init__(self, *args: Any, **kwargs: Any) -> None: super().__init__(*args, quant_type="nf4", compress_statistics=True, **kwargs) - # these classes are defined programatically like this to avoid importing bitsandbytes in environments that have + # these classes are defined programmatically like this to avoid importing bitsandbytes in environments that have # it available but will not use it classes = { "_Linear8bitLt": _Linear8bitLt, diff --git a/src/lightning/pytorch/trainer/connectors/accelerator_connector.py b/src/lightning/pytorch/trainer/connectors/accelerator_connector.py index 40ee0eef4de33..603aedfc94589 100644 --- a/src/lightning/pytorch/trainer/connectors/accelerator_connector.py +++ b/src/lightning/pytorch/trainer/connectors/accelerator_connector.py @@ -248,7 +248,7 @@ def _check_config_and_set_final_flags( else: raise MisconfigurationException( f"Found invalid type for plugin {plugin}. Expected one of: Precision, " - "CheckpointIO, ClusterEnviroment, or LayerSync." + "CheckpointIO, ClusterEnvironment, or LayerSync." ) duplicated_plugin_key = [k for k, v in plugins_flags_types.items() if v > 1] diff --git a/tests/tests_fabric/utilities/test_data.py b/tests/tests_fabric/utilities/test_data.py index faff6e182a06f..91b0a4e47b8b0 100644 --- a/tests/tests_fabric/utilities/test_data.py +++ b/tests/tests_fabric/utilities/test_data.py @@ -53,8 +53,9 @@ def test_has_len(): def test_replace_dunder_methods_multiple_loaders_without_init(): """In case of a class, that inherits from a class that we are patching, but doesn't define its own `__init__` method (the one we are wrapping), it can happen, that `hasattr(cls, "__old__init__")` is True because of parent - class, but it is impossible to delete, because that method is owned by parent class. Furthermore, the error occured - only sometimes because it depends on the order in which we are iterating over a set of classes we are patching. + class, but it is impossible to delete, because that method is owned by parent class. Furthermore, the error + occurred only sometimes because it depends on the order in which we are iterating over a set of classes we are + patching. This test simulates the behavior by generating sufficient number of dummy classes, which do not define `__init__` and are children of `DataLoader`. We are testing that a) context manager `_replace_dunder_method` exits cleanly, and