You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -74,11 +74,11 @@ When deciding what parallelism techniques to choose for your model, use these co
74
74
75
75
* See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__
76
76
77
-
#. Use `FullyShardedDataParallel (FSDP) <https://pytorch.org/docs/stable/fsdp.html>`__ when your model cannot fit on one GPU.
77
+
#. Use `FullyShardedDataParallel (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__ when your model cannot fit on one GPU.
78
78
79
-
* See also: `Getting Started with FSDP<https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
79
+
* See also: `Getting Started with FSDP2<https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
80
80
81
-
#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP.
81
+
#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP2.
0 commit comments