diff --git a/docs/source-pytorch/accelerators/tpu_advanced.rst b/docs/source-pytorch/accelerators/tpu_advanced.rst index e410c6e82539f..d74f9b07374c9 100644 --- a/docs/source-pytorch/accelerators/tpu_advanced.rst +++ b/docs/source-pytorch/accelerators/tpu_advanced.rst @@ -52,7 +52,7 @@ Example: model = WeightSharingModule() trainer = Trainer(max_epochs=1, accelerator="tpu") -See `XLA Documentation `_ +See `XLA Documentation `_ ---- @@ -61,4 +61,4 @@ XLA XLA is the library that interfaces PyTorch with the TPUs. For more information check out `XLA `_. -Guide for `troubleshooting XLA `_ +Guide for `troubleshooting XLA `_ diff --git a/docs/source-pytorch/accelerators/tpu_basic.rst b/docs/source-pytorch/accelerators/tpu_basic.rst index fb4e2b7bde244..217b76106aea9 100644 --- a/docs/source-pytorch/accelerators/tpu_basic.rst +++ b/docs/source-pytorch/accelerators/tpu_basic.rst @@ -108,7 +108,7 @@ There are cases in which training on TPUs is slower when compared with GPUs, for - XLA Graph compilation during the initial steps `Reference `_ - Some tensor ops are not fully supported on TPU, or not supported at all. These operations will be performed on CPU (context switch). -The official PyTorch XLA `performance guide `_ +The official PyTorch XLA `performance guide `_ has more detailed information on how PyTorch code can be optimized for TPU. In particular, the -`metrics report `_ allows +`metrics report `_ allows one to identify operations that lead to context switching. diff --git a/docs/source-pytorch/accelerators/tpu_faq.rst b/docs/source-pytorch/accelerators/tpu_faq.rst index f4b2c60633d26..109449ef2cc9a 100644 --- a/docs/source-pytorch/accelerators/tpu_faq.rst +++ b/docs/source-pytorch/accelerators/tpu_faq.rst @@ -78,7 +78,7 @@ A lot of PyTorch operations aren't lowered to XLA, which could lead to significa These operations are moved to the CPU memory and evaluated, and then the results are transferred back to the XLA device(s). By using the `xla_debug` Strategy, users could create a metrics report to diagnose issues. -The report includes things like (`XLA Reference `_): +The report includes things like (`XLA Reference `_): * how many times we issue XLA compilations and time spent on issuing. * how many times we execute and time spent on execution diff --git a/src/lightning/fabric/strategies/deepspeed.py b/src/lightning/fabric/strategies/deepspeed.py index 93a17f10c8998..e71b8e2db3d58 100644 --- a/src/lightning/fabric/strategies/deepspeed.py +++ b/src/lightning/fabric/strategies/deepspeed.py @@ -598,7 +598,7 @@ def _initialize_engine( ) -> Tuple["DeepSpeedEngine", Optimizer]: """Initialize one model and one optimizer with an optional learning rate scheduler. - This calls :func:`deepspeed.initialize` internally. + This calls ``deepspeed.initialize`` internally. """ import deepspeed diff --git a/src/lightning/fabric/strategies/xla_fsdp.py b/src/lightning/fabric/strategies/xla_fsdp.py index 6da693bafb1c8..e4c080d8110db 100644 --- a/src/lightning/fabric/strategies/xla_fsdp.py +++ b/src/lightning/fabric/strategies/xla_fsdp.py @@ -56,7 +56,7 @@ class XLAFSDPStrategy(ParallelStrategy, _Sharded): .. warning:: This is an :ref:`experimental ` feature. - For more information check out https://github.com/pytorch/xla/blob/master/docs/fsdp.md + For more information check out https://github.com/pytorch/xla/blob/v2.5.0/docs/fsdp.md Args: auto_wrap_policy: Same as ``auto_wrap_policy`` parameter in diff --git a/src/lightning/pytorch/strategies/deepspeed.py b/src/lightning/pytorch/strategies/deepspeed.py index 382f8070898f8..1eaa5bab75fbe 100644 --- a/src/lightning/pytorch/strategies/deepspeed.py +++ b/src/lightning/pytorch/strategies/deepspeed.py @@ -414,7 +414,7 @@ def _setup_model_and_optimizer( ) -> Tuple["deepspeed.DeepSpeedEngine", Optimizer]: """Initialize one model and one optimizer with an optional learning rate scheduler. - This calls :func:`deepspeed.initialize` internally. + This calls ``deepspeed.initialize`` internally. """ import deepspeed