From 29e626bd2e4cef8d65ad724641d464d93942edc8 Mon Sep 17 00:00:00 2001
From: "Wei (Will) Feng" <weif@meta.com>
Date: Sun, 20 Jul 2025 14:53:38 -0700
Subject: [PATCH 1/4] [distributed] point to fsdp2 and remove fsdp1 in
 distributed overview

---
 beginner_source/dist_overview.rst | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst
index 2c74bb51a04..e0ae49e30b4 100644
--- a/beginner_source/dist_overview.rst
+++ b/beginner_source/dist_overview.rst
@@ -1,6 +1,6 @@
 PyTorch Distributed Overview
 ============================
-**Author**: `Will Constable <https://github.com/wconstab/>`_
+**Author**: `Will Constable <https://github.com/wconstab/>`_, `Wei Feng <https://github.com/weifengpy>`_
 
 .. note::
    |edit| View and edit this tutorial in `github <https://github.com/pytorch/tutorials/blob/main/beginner_source/dist_overview.rst>`__.
@@ -26,7 +26,7 @@ Parallelism APIs
 These Parallelism Modules offer high-level functionality and compose with existing models:
 
 - `Distributed Data-Parallel (DDP) <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
-- `Fully Sharded Data-Parallel Training (FSDP) <https://pytorch.org/docs/stable/fsdp.html>`__
+- `Fully Sharded Data-Parallel Training (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__
 - `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__
 - `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__
 
@@ -74,9 +74,9 @@ When deciding what parallelism techniques to choose for your model, use these co
 
    * See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__
 
-#. Use `FullyShardedDataParallel (FSDP) <https://pytorch.org/docs/stable/fsdp.html>`__ when your model cannot fit on one GPU.
+#. Use `FullyShardedDataParallel (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__ when your model cannot fit on one GPU.
 
-   * See also: `Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
+   * See also: `Getting Started with FSDP2 <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
 
 #. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP.
 

From 957f0cb0dc12676598cfb04eda4ef55348820575 Mon Sep 17 00:00:00 2001
From: "Wei (Will) Feng" <weif@meta.com>
Date: Sun, 20 Jul 2025 14:55:23 -0700
Subject: [PATCH 2/4] use fsdp2

---
 beginner_source/dist_overview.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst
index e0ae49e30b4..9088434bf2f 100644
--- a/beginner_source/dist_overview.rst
+++ b/beginner_source/dist_overview.rst
@@ -78,7 +78,7 @@ When deciding what parallelism techniques to choose for your model, use these co
 
    * See also: `Getting Started with FSDP2 <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
 
-#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP.
+#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP2.
 
    * Try our `Tensor Parallelism Tutorial <https://pytorch.org/tutorials/intermediate/TP_tutorial.html>`__
 

From df98461a859fdda6dce48cbb7f90c5dcdc5ba927 Mon Sep 17 00:00:00 2001
From: "Wei (Will) Feng" <weif@meta.com>
Date: Sun, 20 Jul 2025 15:13:09 -0700
Subject: [PATCH 3/4] mark fsdp1 tutorial as deprecated and no page is linking
 to fsdp1

---
 intermediate_source/FSDP1_tutorial.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/intermediate_source/FSDP1_tutorial.rst b/intermediate_source/FSDP1_tutorial.rst
index b983879a449..093d1941089 100644
--- a/intermediate_source/FSDP1_tutorial.rst
+++ b/intermediate_source/FSDP1_tutorial.rst
@@ -1,4 +1,4 @@
-Getting Started with Fully Sharded Data Parallel(FSDP)
+[Deprecated] Getting Started with Fully Sharded Data Parallel (FSDP)
 ======================================================
 
 **Author**: `Hamid Shojanazeri <https://github.com/HamidShojanazeri>`__, `Yanli Zhao <https://github.com/zhaojuanmao>`__, `Shen Li <https://mrshenli.github.io/>`__

From 7f42e47c32a45b1142de7336c44f8ecbf5275037 Mon Sep 17 00:00:00 2001
From: "Wei (Will) Feng" <weif@meta.com>
Date: Sun, 20 Jul 2025 15:19:38 -0700
Subject: [PATCH 4/4] revert fsdp1 tutorial change

---
 intermediate_source/FSDP1_tutorial.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/intermediate_source/FSDP1_tutorial.rst b/intermediate_source/FSDP1_tutorial.rst
index 093d1941089..b983879a449 100644
--- a/intermediate_source/FSDP1_tutorial.rst
+++ b/intermediate_source/FSDP1_tutorial.rst
@@ -1,4 +1,4 @@
-[Deprecated] Getting Started with Fully Sharded Data Parallel (FSDP)
+Getting Started with Fully Sharded Data Parallel(FSDP)
 ======================================================
 
 **Author**: `Hamid Shojanazeri <https://github.com/HamidShojanazeri>`__, `Yanli Zhao <https://github.com/zhaojuanmao>`__, `Shen Li <https://mrshenli.github.io/>`__