Merge branch 'main' into sycl_extension/zzq

ZhaoqiongZ · web-flow · commit 0146260b89fe · 2025-07-22T09:13:19.000+08:00
diff --git a/.ci/docker/requirements.txt b/.ci/docker/requirements.txt
@@ -1,7 +1,6 @@
 # --extra-index-url https://download.pytorch.org/whl/cu117/index.html # Use this to run/publish tutorials against the latest binaries during the RC stage. Comment out after the release. Each release verify the correct cuda version.
 # Refer to ./jenkins/build.sh for tutorial build instructions.
 
-
 # Sphinx dependencies
 sphinx==7.2.6
 sphinx-gallery==0.19.0
@@ -16,6 +15,7 @@ pypandoc==1.15
 pandocfilters==1.5.1
 markdown==3.8.2
 
+
 # PyTorch Theme
 -e git+https://github.com/pytorch/pytorch_sphinx_theme.git@pytorch_sphinx_theme2#egg=pytorch_sphinx_theme2
 
diff --git a/beginner_source/dist_overview.rst b/beginner_source/dist_overview.rst
@@ -1,6 +1,6 @@
 PyTorch Distributed Overview
 ============================
-**Author**: `Will Constable <https://github.com/wconstab/>`_
+**Author**: `Will Constable <https://github.com/wconstab/>`_, `Wei Feng <https://github.com/weifengpy>`_
 
 .. note::
    |edit| View and edit this tutorial in `github <https://github.com/pytorch/tutorials/blob/main/beginner_source/dist_overview.rst>`__.
@@ -26,7 +26,7 @@ Parallelism APIs
 These Parallelism Modules offer high-level functionality and compose with existing models:
 
 - `Distributed Data-Parallel (DDP) <https://pytorch.org/docs/stable/generated/torch.nn.parallel.DistributedDataParallel.html>`__
-- `Fully Sharded Data-Parallel Training (FSDP) <https://pytorch.org/docs/stable/fsdp.html>`__
+- `Fully Sharded Data-Parallel Training (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__
 - `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__
 - `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__
 
@@ -74,11 +74,11 @@ When deciding what parallelism techniques to choose for your model, use these co
 
    * See also: `Getting Started with Distributed Data Parallel <../intermediate/ddp_tutorial.html>`__
 
-#. Use `FullyShardedDataParallel (FSDP) <https://pytorch.org/docs/stable/fsdp.html>`__ when your model cannot fit on one GPU.
+#. Use `FullyShardedDataParallel (FSDP2) <https://pytorch.org/docs/stable/distributed.fsdp.fully_shard.html>`__ when your model cannot fit on one GPU.
 
-   * See also: `Getting Started with FSDP <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
+   * See also: `Getting Started with FSDP2 <https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html>`__
 
-#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP.
+#. Use `Tensor Parallel (TP) <https://pytorch.org/docs/stable/distributed.tensor.parallel.html>`__ and/or `Pipeline Parallel (PP) <https://pytorch.org/docs/main/distributed.pipelining.html>`__ if you reach scaling limitations with FSDP2.
 
    * Try our `Tensor Parallelism Tutorial <https://pytorch.org/tutorials/intermediate/TP_tutorial.html>`__
 
diff --git a/deep-dive.rst b/deep-dive.rst
@@ -34,6 +34,7 @@ and speed.
    :header: Profiling PyTorch
    :card_description: Learn how to profile a PyTorch application
    :link: beginner/profiler.html
+   :image: _static/img/thumbnails/cropped/pytorch-logo.png
    :tags: Profiling
 
 .. customcarditem::
diff --git a/distributed.rst b/distributed.rst
@@ -12,7 +12,7 @@ There are a few ways you can perform distributed training in
 PyTorch with each method having their advantages in certain use cases:
 
 * `DistributedDataParallel (DDP) <#learn-ddp>`__
-* `Fully Sharded Data Parallel (FSDP) <#learn-fsdp>`__
+* `Fully Sharded Data Parallel (FSDP2) <#learn-fsdp>`__
 * `Tensor Parallel (TP) <#learn-tp>`__
 * `Device Mesh <#device-mesh>`__
 * `Remote Procedure Call (RPC) distributed training <#learn-rpc>`__
@@ -60,28 +60,18 @@ Learn DDP
 
 .. _learn-fsdp:
 
-Learn FSDP
+Learn FSDP2
 ----------
 
 .. grid:: 3
 
      .. grid-item-card:: :octicon:`file-code;1em`
-        Getting Started with FSDP
+        Getting Started with FSDP2
         :link: https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_getting_started
         :link-type: url
 
         This tutorial demonstrates how you can perform distributed training
-        with FSDP on a MNIST dataset.
-        +++
-        :octicon:`code;1em` Code
-
-     .. grid-item-card:: :octicon:`file-code;1em`
-        FSDP Advanced
-        :link: https://pytorch.org/tutorials/intermediate/FSDP_advanced_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_advanced
-        :link-type: url
-
-        In this tutorial, you will learn how to fine-tune a HuggingFace (HF) T5
-        model with FSDP for text summarization.
+        with FSDP2 on a transformer model
         +++
         :octicon:`code;1em` Code
 
@@ -196,7 +186,6 @@ Custom Extensions
    intermediate/ddp_tutorial
    intermediate/dist_tuto
    intermediate/FSDP_tutorial
-   intermediate/FSDP_advanced_tutorial
    intermediate/TCPStore_libuv_backend
    intermediate/TP_tutorial
    intermediate/pipelining_tutorial
diff --git a/ecosystem.rst b/ecosystem.rst
@@ -54,7 +54,7 @@ to production deployment.
    :card_description: This tutorial covers how to run quantized and fused models on a Raspberry Pi 4 at 30 fps.
    :image: _static/img/thumbnails/cropped/realtime_rpi.png
    :link: intermediate/realtime_rpi.html
-   :tags: TorchScript,Model-Optimization,Image/Video,Quantization,Ecosystem
+   :tags: Model-Optimization,Image/Video,Quantization,Ecosystem
 
 .. End of tutorial card section
 .. -----------------------------------------
diff --git a/index.rst b/index.rst
@@ -666,13 +666,6 @@ Welcome to PyTorch Tutorials
    :link: intermediate/FSDP_tutorial.html
    :tags: Parallel-and-Distributed-Training
 
-.. customcarditem::
-   :header: Advanced Model Training with Fully Sharded Data Parallel (FSDP1)
-   :card_description: Explore advanced model training with Fully Sharded Data Parallel package.
-   :image: _static/img/thumbnails/cropped/Getting-Started-with-FSDP.png
-   :link: intermediate/FSDP_advanced_tutorial.html
-   :tags: Parallel-and-Distributed-Training
-
 .. customcarditem::
    :header: Introduction to Libuv TCPStore Backend
    :card_description: TCPStore now uses a new server backend for faster connection and better scalability.
diff --git a/intermediate_source/FSDP_tutorial.rst b/intermediate_source/FSDP_tutorial.rst
@@ -4,7 +4,7 @@ Getting Started with Fully Sharded Data Parallel (FSDP2)
 **Author**: `Wei Feng <https://github.com/weifengpy>`__, `Will Constable <https://github.com/wconstab>`__, `Yifan Mao <https://github.com/mori360>`__
 
 .. note::
-   |edit| Check out the code in this tutorial from `pytorch/examples <https://github.com/pytorch/examples/tree/main/distributed/FSDP2>`_. FSDP1 will be deprecated. The old tutorial can be found `here <https://docs.pytorch.org/tutorials/intermediate/FSDP1_tutorial.html>`_.
+   |edit| Check out the code in this tutorial from `pytorch/examples <https://github.com/pytorch/examples/tree/main/distributed/FSDP2>`_. FSDP1 is deprecated. FSDP1 tutorials are archived in `[1] <https://docs.pytorch.org/tutorials/intermediate/FSDP1_tutorial.html>`_ and `[2] <https://docs.pytorch.org/tutorials/intermediate/FSDP_advanced_tutorial.html>`_
 
 How FSDP2 works
 --------------
diff --git a/prototype_source/prototype_index.rst b/prototype_source/prototype_index.rst
@@ -1,5 +1,6 @@
 Unstable
---------
+========
+
 API unstable features are not available as part of binary distributions
 like PyPI or Conda (except maybe behind run-time flags). To test these
 features we would, depending on the feature, recommend building PyTorch
@@ -14,10 +15,7 @@ decide if we want to upgrade the level of commitment or to fail fast.
 
 .. raw:: html
 
-        </div>
-    </div>
-
-    <div id="tutorial-cards-container">
+   <div id="tutorial-cards-container">
 
     <nav class="navbar navbar-expand-lg navbar-light tutorials-nav col-12">
         <div class="tutorial-tags-container">
@@ -43,7 +41,7 @@ decide if we want to upgrade the level of commitment or to fail fast.
 .. customcarditem::
    :header: (prototype) Accelerating BERT with semi-structured (2:4) sparsity
    :card_description: Prune BERT to be 2:4 sparse and accelerate for inference.
-   :image: _static/img/thumbnails/cropped/generic-pytorch-logo.png
+   :image: ../_static/img/thumbnails/cropped/generic-pytorch-logo.png
    :link: prototype/semi_structured_sparse.html
    :tags: Model-Optimiziation
 
@@ -160,21 +158,12 @@ decide if we want to upgrade the level of commitment or to fail fast.
 
 .. End of tutorial card section
 
-.. raw:: html
-
-    </div>
-
-    <div class="pagination d-flex justify-content-center"></div>
-
-    </div>
-
-    </div>
-
 .. -----------------------------------------
 .. Page TOC
 .. -----------------------------------------
 
 .. toctree::
+   :maxdepth: 2
    :hidden:
 
    /prototype/context_parallel
diff --git a/recipes_source/recipes_index.rst b/recipes_source/recipes_index.rst
@@ -1,13 +1,13 @@
 Recipes
----------------------------------------------
-Recipes are bite-sized, actionable examples of how to use specific PyTorch features, different from our full-length tutorials.
+========
 
-.. raw:: html
+Recipes are bite-sized, actionable examples of
+how to use specific PyTorch features, different
+from our full-length tutorials.
 
-        </div>
-    </div>
+.. raw:: html
 
-    <div id="tutorial-cards-container">
+   <div id="tutorial-cards-container">
 
     <nav class="navbar navbar-expand-lg navbar-light tutorials-nav col-12">
         <div class="tutorial-tags-container">
@@ -335,20 +335,11 @@ Recipes are bite-sized, actionable examples of how to use specific PyTorch featu
 
 .. End of tutorial card section
 
-.. raw:: html
-
-    </div>
-
-    <div class="pagination d-flex justify-content-center"></div>
-
-    </div>
-
-    </div>
-
 .. -----------------------------------------
 .. Page TOC
 .. -----------------------------------------
 .. toctree::
+   :maxdepth: 2
    :hidden:
 
    /recipes/recipes/defining_a_neural_network