diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst index 2c74bb51a04..9088434bf2f 100644 --- a/beginner_source/dist_overview.rst +++ b/beginner_source/dist_overview.rst @@ -1,6 +1,6 @@ PyTorch Distributed Overview ============================ -**Author**: `Will Constable `_ +**Author**: `Will Constable `_, `Wei Feng `_ .. note:: |edit| View and edit this tutorial in `github `__. @@ -26,7 +26,7 @@ Parallelism APIs These Parallelism Modules offer high-level functionality and compose with existing models: - `Distributed Data-Parallel (DDP) `__ -- `Fully Sharded Data-Parallel Training (FSDP) `__ +- `Fully Sharded Data-Parallel Training (FSDP2) `__ - `Tensor Parallel (TP) `__ - `Pipeline Parallel (PP) `__ @@ -74,11 +74,11 @@ When deciding what parallelism techniques to choose for your model, use these co * See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__ -#. Use `FullyShardedDataParallel (FSDP) `__ when your model cannot fit on one GPU. +#. Use `FullyShardedDataParallel (FSDP2) `__ when your model cannot fit on one GPU. - * See also: `Getting Started with FSDP `__ + * See also: `Getting Started with FSDP2 `__ -#. Use `Tensor Parallel (TP) `__ and/or `Pipeline Parallel (PP) `__ if you reach scaling limitations with FSDP. +#. Use `Tensor Parallel (TP) `__ and/or `Pipeline Parallel (PP) `__ if you reach scaling limitations with FSDP2. * Try our `Tensor Parallelism Tutorial `__