diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst
index 2c74bb51a04..9088434bf2f 100644
--- a/beginner_source/dist_overview.rst
+++ b/beginner_source/dist_overview.rst
@@ -1,6 +1,6 @@
PyTorch Distributed Overview
============================
-**Author**: `Will Constable `_
+**Author**: `Will Constable `_, `Wei Feng `_
.. note::
|edit| View and edit this tutorial in `github `__.
@@ -26,7 +26,7 @@ Parallelism APIs
These Parallelism Modules offer high-level functionality and compose with existing models:
- `Distributed Data-Parallel (DDP) `__
-- `Fully Sharded Data-Parallel Training (FSDP) `__
+- `Fully Sharded Data-Parallel Training (FSDP2) `__
- `Tensor Parallel (TP) `__
- `Pipeline Parallel (PP) `__
@@ -74,11 +74,11 @@ When deciding what parallelism techniques to choose for your model, use these co
* See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__
-#. Use `FullyShardedDataParallel (FSDP) `__ when your model cannot fit on one GPU.
+#. Use `FullyShardedDataParallel (FSDP2) `__ when your model cannot fit on one GPU.
- * See also: `Getting Started with FSDP `__
+ * See also: `Getting Started with FSDP2 `__
-#. Use `Tensor Parallel (TP) `__ and/or `Pipeline Parallel (PP) `__ if you reach scaling limitations with FSDP.
+#. Use `Tensor Parallel (TP) `__ and/or `Pipeline Parallel (PP) `__ if you reach scaling limitations with FSDP2.
* Try our `Tensor Parallelism Tutorial `__