Merge branch 'main' into 2.6-RC-TEST

svekars · web-flow · commit f2538e9a94ee · 2025-01-24T10:11:03.000-08:00
diff --git a/.lycheeignore b/.lycheeignore
@@ -12,3 +12,9 @@ https://pytorch.org/tutorials/beginner/colab/n
 
 # Ignore local host link from intermediate_source/tensorboard_tutorial.rst
 http://localhost:6006
+
+# Ignore local host link from recipes_source/deployment_with_flask.rst
+http://localhost:5000/predict 
+
+# Ignore local host link from advanced_source/cpp_frontend.rst 
+https://www.uber.com/blog/deep-neuroevolution/
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -218,9 +218,8 @@ described in the preceding sections:
 - [NLP From Scratch: Generating Names with a Character-Level RNN
 Tutorial](https://pytorch.org/tutorials/intermediate/char_rnn_generation_tutorial.html)
 
-If you are creating a recipe, we recommend that you use [this
-template](https://github.com/pytorch/tutorials/blob/tutorials_refresh/recipes_source/recipes/example_recipe.py)
-as a guide.
+If you are creating a recipe, [this is a good
+example.](https://github.com/pytorch/tutorials/blob/main/recipes_source/recipes/what_is_state_dict.py)
 
 
 # Submission Process #
diff --git a/advanced_source/cpp_autograd.rst b/advanced_source/cpp_autograd.rst
@@ -255,9 +255,9 @@ Out:
   [ CPUFloatType{3,4} ]
 
 Please see the documentation for ``torch::autograd::backward``
-(`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1afa9b5d4329085df4b6b3d4b4be48914b.html>`_)
+(`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1a1403bf65b1f4f8c8506a9e6e5312d030.html>`_)
 and ``torch::autograd::grad``
-(`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1a1e03c42b14b40c306f9eb947ef842d9c.html>`_)
+(`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1ab9fa15dc09a8891c26525fb61d33401a.html>`_)
 for more information on how to use them.
 
 Using custom autograd function in C++
@@ -394,9 +394,9 @@ C++ using the following table:
 +--------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | Python                         | C++                                                                                                                                                                    |
 +================================+========================================================================================================================================================================+
-| ``torch.autograd.backward``    | ``torch::autograd::backward`` (`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1afa9b5d4329085df4b6b3d4b4be48914b.html>`_)                  |
+| ``torch.autograd.backward``    | ``torch::autograd::backward`` (`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1a1403bf65b1f4f8c8506a9e6e5312d030.html>`_)                  |
 +--------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
-| ``torch.autograd.grad``        | ``torch::autograd::grad`` (`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1a1e03c42b14b40c306f9eb947ef842d9c.html>`_)                      |
+| ``torch.autograd.grad``        | ``torch::autograd::grad`` (`link <https://pytorch.org/cppdocs/api/function_namespacetorch_1_1autograd_1ab9fa15dc09a8891c26525fb61d33401a.html>`_)                      |
 +--------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
 | ``torch.Tensor.detach``        | ``torch::Tensor::detach`` (`link <https://pytorch.org/cppdocs/api/classat_1_1_tensor.html#_CPPv4NK2at6Tensor6detachEv>`_)                                              |
 +--------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
diff --git a/advanced_source/cpp_frontend.rst b/advanced_source/cpp_frontend.rst
@@ -57,7 +57,7 @@ the right tool for the job. Examples for such environments include:
   Multiprocessing is an alternative, but not as scalable and has significant
   shortcomings. C++ has no such constraints and threads are easy to use and
   create. Models requiring heavy parallelization, like those used in `Deep
-  Neuroevolution <https://eng.uber.com/deep-neuroevolution/>`_, can benefit from
+  Neuroevolution <https://www.uber.com/blog/deep-neuroevolution/>`_, can benefit from
   this.
 - **Existing C++ Codebases**: You may be the owner of an existing C++
   application doing anything from serving web pages in a backend server to
@@ -662,7 +662,7 @@ Defining the DCGAN Modules
 We now have the necessary background and introduction to define the modules for
 the machine learning task we want to solve in this post. To recap: our task is
 to generate images of digits from the `MNIST dataset
-<http://yann.lecun.com/exdb/mnist/>`_. We want to use a `generative adversarial
+<https://huggingface.co/datasets/ylecun/mnist>`_. We want to use a `generative adversarial
 network (GAN)
 <https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf>`_ to solve
 this task. In particular, we'll use a `DCGAN architecture
diff --git a/beginner_source/pytorch_with_examples.rst b/beginner_source/pytorch_with_examples.rst
@@ -149,7 +149,7 @@ which will be optimized during learning.
 
 In TensorFlow, packages like
 `Keras <https://github.com/fchollet/keras>`__,
-`TensorFlow-Slim <https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slim>`__,
+`TensorFlow-Slim <https://github.com/google-research/tf-slim>`__,
 and `TFLearn <http://tflearn.org/>`__ provide higher-level abstractions
 over raw computational graphs that are useful for building neural
 networks.
diff --git a/en-wordlist.txt b/en-wordlist.txt
@@ -81,6 +81,8 @@ FX
 FX's
 FairSeq
 Fastpath
+FakeTensor
+FakeTensors
 FFN
 FloydHub
 FloydHub's
@@ -368,6 +370,8 @@ downsample
 downsamples
 dropdown
 dtensor
+dtype
+dtypes
 duration
 elementwise
 embeddings
@@ -615,6 +619,7 @@ triton
 uint
 UX
 umap
+unbacked
 uncomment
 uncommented
 underflowing
@@ -651,7 +656,6 @@ RecSys
 TorchRec
 sharding
 TBE
-dtype
 EBC
 sharder
 hyperoptimized
diff --git a/intermediate_source/FSDP_tutorial.rst b/intermediate_source/FSDP_tutorial.rst
@@ -11,7 +11,7 @@ It also comes with considerable engineering complexity to handle the training of
 `PyTorch FSDP <https://pytorch.org/blog/introducing-pytorch-fully-sharded-data-parallel-api/>`__, released in PyTorch 1.11 makes this easier.
 
 In this tutorial, we show how to use `FSDP APIs <https://pytorch.org/docs/stable/fsdp.html>`__, for simple MNIST models that can be extended to other larger models such as `HuggingFace BERT models <https://huggingface.co/blog/zero-deepspeed-fairscale>`__, 
-`GPT 3 models up to 1T parameters <https://pytorch.medium.com/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ . The sample DDP MNIST code has been borrowed from `here <https://github.com/yqhu/mnist_examples>`__. 
+`GPT 3 models up to 1T parameters <https://pytorch.medium.com/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff>`__ . The sample DDP MNIST code courtesy of `Patrick Hu <https://github.com/yqhu/>`_. 
 
 
 How FSDP works
diff --git a/intermediate_source/ddp_series_minGPT.rst b/intermediate_source/ddp_series_minGPT.rst
@@ -6,7 +6,7 @@ training <ddp_series_multinode.html>`__ \|\| **minGPT Training**
 Training “real-world” models with DDP
 =====================================
 
-Authors: `Suraj Subramanian <https://github.com/suraj813>`__
+Authors: `Suraj Subramanian <https://github.com/subramen>`__
 
 .. grid:: 2
 
diff --git a/intermediate_source/ddp_series_multinode.rst b/intermediate_source/ddp_series_multinode.rst
@@ -6,7 +6,7 @@ training** \|\| `minGPT Training <ddp_series_minGPT.html>`__
 Multinode Training
 ==================
 
-Authors: `Suraj Subramanian <https://github.com/suraj813>`__
+Authors: `Suraj Subramanian <https://github.com/subramen>`__
 
 .. grid:: 2
 
diff --git a/intermediate_source/dynamic_quantization_bert_tutorial.rst b/intermediate_source/dynamic_quantization_bert_tutorial.rst
@@ -138,7 +138,7 @@ the following helper functions: one for converting the text examples
 into the feature vectors; The other one for measuring the F1 score of
 the predicted result.
 
-The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_ function converts the texts into input features:
+The `glue_convert_examples_to_features <https://github.com/huggingface/transformers/blob/main/src/transformers/data/datasets/glue.py>`_ function converts the texts into input features:
 
 -  Tokenize the input sequences;
 -  Insert [CLS] in the beginning;
@@ -147,7 +147,7 @@ The `glue_convert_examples_to_features <https://github.com/huggingface/transform
 -  Generate token type ids to indicate whether a token belongs to the
    first sequence or the second sequence.
 
-The `glue_compute_metrics <https://github.com/huggingface/transformers/blob/master/transformers/data/processors/glue.py>`_  function has the compute metrics with
+The `glue_compute_metrics <https://github.com/huggingface/transformers/blob/main/src/transformers/data/metrics/__init__.py#L60>`_  function has the compute metrics with
 the `F1 score <https://scikit-learn.org/stable/modules/generated/sklearn.metrics.f1_score.html>`_, which
 can be interpreted as a weighted average of the precision and recall,
 where an F1 score reaches its best value at 1 and worst score at 0. The
@@ -273,7 +273,7 @@ We load the tokenizer and fine-tuned BERT sequence classifier model
 2.3 Define the tokenize and evaluation function
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
-We reuse the tokenize and evaluation function from `HuggingFace <https://github.com/huggingface/transformers/blob/master/examples/run_glue.py>`_.
+We reuse the tokenize and evaluation function from `HuggingFace <https://github.com/huggingface/transformers/blob/main/examples/legacy/pytorch-lightning/run_glue.py>`_.
 
 .. code:: python
 
diff --git a/intermediate_source/rpc_tutorial.rst b/intermediate_source/rpc_tutorial.rst
@@ -59,7 +59,7 @@ Distributed Reinforcement Learning using RPC and RRef
 -----------------------------------------------------
 
 This section describes steps to build a toy distributed reinforcement learning
-model using RPC to solve CartPole-v1 from `OpenAI Gym <https://gym.openai.com>`__.
+model using RPC to solve CartPole-v1 from `OpenAI Gym <https://www.gymlibrary.dev/environments/classic_control/cart_pole/>`__.
 The policy code is mostly borrowed from the existing single-thread
 `example <https://github.com/pytorch/examples/blob/master/reinforcement_learning>`__
 as shown below. We will skip details of the ``Policy`` design, and focus on RPC
@@ -156,7 +156,7 @@ send commands. Applications don't need to worry about the lifetime of ``RRefs``.
 The owner of each ``RRef`` maintains a reference counting map to track its
 lifetime, and guarantees the remote data object will not be deleted as long as
 there is any live user of that ``RRef``. Please refer to the ``RRef``
-`design doc <https://pytorch.org/docs/master/notes/rref.html>`__ for details.
+`design doc <https://pytorch.org/docs/stable/rpc/rref.html>`__ for details.
 
 
 .. code:: python
@@ -531,7 +531,7 @@ the given arguments (i.e., ``lr=0.05``).
 In the training loop, it first creates a distributed autograd context, which
 will help the distributed autograd engine to find gradients and involved RPC
 send/recv functions. The design details of the distributed autograd engine can
-be found in its `design note <https://pytorch.org/docs/master/notes/distributed_autograd.html>`__.
+be found in its `design note <https://pytorch.org/docs/stable/rpc/distributed_autograd.html>`__.
 Then, it kicks off the forward pass as if it is a local
 model, and run the distributed backward pass. For the distributed backward, you
 only need to specify a list of roots, in this case, it is the loss ``Tensor``.
diff --git a/intermediate_source/torch_export_tutorial.py b/intermediate_source/torch_export_tutorial.py
diff --git a/intermediate_source/torchserve_with_ipex.rst b/intermediate_source/torchserve_with_ipex.rst
diff --git a/prototype_source/fx_graph_mode_ptq_static.rst b/prototype_source/fx_graph_mode_ptq_static.rst
diff --git a/recipes_source/recipes_index.rst b/recipes_source/recipes_index.rst
diff --git a/recipes_source/torch_export_challenges_solutions.rst b/recipes_source/torch_export_challenges_solutions.rst