From 29e626bd2e4cef8d65ad724641d464d93942edc8 Mon Sep 17 00:00:00 2001 From: "Wei (Will) Feng" Date: Sun, 20 Jul 2025 14:53:38 -0700 Subject: [PATCH 1/4] [distributed] point to fsdp2 and remove fsdp1 in distributed overview --- beginner_source/dist_overview.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst index 2c74bb51a04..e0ae49e30b4 100644 --- a/beginner_source/dist_overview.rst +++ b/beginner_source/dist_overview.rst @@ -1,6 +1,6 @@ PyTorch Distributed Overview ============================ -**Author**: `Will Constable `_ +**Author**: `Will Constable `_, `Wei Feng `_ .. note:: |edit| View and edit this tutorial in `github `__. @@ -26,7 +26,7 @@ Parallelism APIs These Parallelism Modules offer high-level functionality and compose with existing models: - `Distributed Data-Parallel (DDP) `__ -- `Fully Sharded Data-Parallel Training (FSDP) `__ +- `Fully Sharded Data-Parallel Training (FSDP2) `__ - `Tensor Parallel (TP) `__ - `Pipeline Parallel (PP) `__ @@ -74,9 +74,9 @@ When deciding what parallelism techniques to choose for your model, use these co * See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__ -#. Use `FullyShardedDataParallel (FSDP) `__ when your model cannot fit on one GPU. +#. Use `FullyShardedDataParallel (FSDP2) `__ when your model cannot fit on one GPU. - * See also: `Getting Started with FSDP `__ + * See also: `Getting Started with FSDP2 `__ #. Use `Tensor Parallel (TP) `__ and/or `Pipeline Parallel (PP) `__ if you reach scaling limitations with FSDP. From 957f0cb0dc12676598cfb04eda4ef55348820575 Mon Sep 17 00:00:00 2001 From: "Wei (Will) Feng" Date: Sun, 20 Jul 2025 14:55:23 -0700 Subject: [PATCH 2/4] use fsdp2 --- beginner_source/dist_overview.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst index e0ae49e30b4..9088434bf2f 100644 --- a/beginner_source/dist_overview.rst +++ b/beginner_source/dist_overview.rst @@ -78,7 +78,7 @@ When deciding what parallelism techniques to choose for your model, use these co * See also: `Getting Started with FSDP2 `__ -#. Use `Tensor Parallel (TP) `__ and/or `Pipeline Parallel (PP) `__ if you reach scaling limitations with FSDP. +#. Use `Tensor Parallel (TP) `__ and/or `Pipeline Parallel (PP) `__ if you reach scaling limitations with FSDP2. * Try our `Tensor Parallelism Tutorial `__ From df98461a859fdda6dce48cbb7f90c5dcdc5ba927 Mon Sep 17 00:00:00 2001 From: "Wei (Will) Feng" Date: Sun, 20 Jul 2025 15:13:09 -0700 Subject: [PATCH 3/4] mark fsdp1 tutorial as deprecated and no page is linking to fsdp1 --- intermediate_source/FSDP1_tutorial.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate_source/FSDP1_tutorial.rst b/intermediate_source/FSDP1_tutorial.rst index b983879a449..093d1941089 100644 --- a/intermediate_source/FSDP1_tutorial.rst +++ b/intermediate_source/FSDP1_tutorial.rst @@ -1,4 +1,4 @@ -Getting Started with Fully Sharded Data Parallel(FSDP) +[Deprecated] Getting Started with Fully Sharded Data Parallel (FSDP) ====================================================== **Author**: `Hamid Shojanazeri `__, `Yanli Zhao `__, `Shen Li `__ From 7f42e47c32a45b1142de7336c44f8ecbf5275037 Mon Sep 17 00:00:00 2001 From: "Wei (Will) Feng" Date: Sun, 20 Jul 2025 15:19:38 -0700 Subject: [PATCH 4/4] revert fsdp1 tutorial change --- intermediate_source/FSDP1_tutorial.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/intermediate_source/FSDP1_tutorial.rst b/intermediate_source/FSDP1_tutorial.rst index 093d1941089..b983879a449 100644 --- a/intermediate_source/FSDP1_tutorial.rst +++ b/intermediate_source/FSDP1_tutorial.rst @@ -1,4 +1,4 @@ -[Deprecated] Getting Started with Fully Sharded Data Parallel (FSDP) +Getting Started with Fully Sharded Data Parallel(FSDP) ====================================================== **Author**: `Hamid Shojanazeri `__, `Yanli Zhao `__, `Shen Li `__