diff --git a/distributed.rst b/distributed.rst index c9ed95a8a03..f392729d4b0 100644 --- a/distributed.rst +++ b/distributed.rst @@ -12,7 +12,7 @@ There are a few ways you can perform distributed training in PyTorch with each method having their advantages in certain use cases: * `DistributedDataParallel (DDP) <#learn-ddp>`__ -* `Fully Sharded Data Parallel (FSDP) <#learn-fsdp>`__ +* `Fully Sharded Data Parallel (FSDP2) <#learn-fsdp>`__ * `Tensor Parallel (TP) <#learn-tp>`__ * `Device Mesh <#device-mesh>`__ * `Remote Procedure Call (RPC) distributed training <#learn-rpc>`__ @@ -60,28 +60,18 @@ Learn DDP .. _learn-fsdp: -Learn FSDP +Learn FSDP2 ---------- .. grid:: 3 .. grid-item-card:: :octicon:`file-code;1em` - Getting Started with FSDP + Getting Started with FSDP2 :link: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_getting_started :link-type: url This tutorial demonstrates how you can perform distributed training - with FSDP on a MNIST dataset. - +++ - :octicon:`code;1em` Code - - .. grid-item-card:: :octicon:`file-code;1em` - FSDP Advanced - :link: https://pytorch.org/tutorials/intermediate/FSDP_advanced_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_advanced - :link-type: url - - In this tutorial, you will learn how to fine-tune a HuggingFace (HF) T5 - model with FSDP for text summarization. + with FSDP2 on a transformer model +++ :octicon:`code;1em` Code @@ -196,7 +186,6 @@ Custom Extensions intermediate/ddp_tutorial intermediate/dist_tuto intermediate/FSDP_tutorial - intermediate/FSDP_advanced_tutorial intermediate/TCPStore_libuv_backend intermediate/TP_tutorial intermediate/pipelining_tutorial