Merge branch 'main' into master

svekars · web-flow · commit 881948161f16 · 2024-12-02T10:46:30.000-08:00
diff --git a/_static/css/custom2.css b/_static/css/custom2.css
@@ -67,8 +67,11 @@ input[type="radio"] {
 .gsc-control-cse {
    padding: 0 !important;
    border-radius: 0px !important;
-   border: none !important;;
-   overflow: hidden;
+   border: none !important;
+}
+
+.gsc-overflow-hidden {
+  overflow: visible !important;
 }
 
 #___gcse_0 {
diff --git a/advanced_source/cpp_export.rst b/advanced_source/cpp_export.rst
@@ -1,7 +1,7 @@
 Loading a TorchScript Model in C++
 =====================================
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 
 As its name suggests, the primary interface to PyTorch is the Python
 programming language. While Python is a suitable and preferred language for
diff --git a/advanced_source/torch-script-parallelism.rst b/advanced_source/torch-script-parallelism.rst
@@ -1,7 +1,7 @@
 Dynamic Parallelism in TorchScript
 ==================================
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 
 In this tutorial, we introduce the syntax for doing *dynamic inter-op parallelism*
 in TorchScript. This parallelism has the following properties:
diff --git a/advanced_source/torch_script_custom_classes.rst b/advanced_source/torch_script_custom_classes.rst
@@ -1,7 +1,7 @@
 Extending TorchScript with Custom C++ Classes
 ===============================================
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 
 This tutorial is a follow-on to the
 :doc:`custom operator <torch_script_custom_ops>`
diff --git a/beginner_source/Intro_to_TorchScript_tutorial.py b/beginner_source/Intro_to_TorchScript_tutorial.py
@@ -4,7 +4,7 @@
 
 **Authors:** James Reed (jamesreed@fb.com), Michael Suo (suo@fb.com), rev2
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 
 This tutorial is an introduction to TorchScript, an intermediate
 representation of a PyTorch model (subclass of ``nn.Module``) that
diff --git a/beginner_source/chatbot_tutorial.py b/beginner_source/chatbot_tutorial.py
@@ -1128,7 +1128,7 @@ def forward(self, input_seq, input_length, max_length):
         # Forward input through encoder model
         encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)
         # Prepare encoder's final hidden layer to be first hidden input to the decoder
-        decoder_hidden = encoder_hidden[:decoder.n_layers]
+        decoder_hidden = encoder_hidden[:self.decoder.n_layers]
         # Initialize decoder input with SOS_token
         decoder_input = torch.ones(1, 1, device=device, dtype=torch.long) * SOS_token
         # Initialize tensors to append decoded words to
diff --git a/beginner_source/deploy_seq2seq_hybrid_frontend_tutorial.py b/beginner_source/deploy_seq2seq_hybrid_frontend_tutorial.py
@@ -4,7 +4,7 @@
 ==================================================
 **Author:** `Matthew Inkawhich <https://github.com/MatthewInkawhich>`_
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 """
 
 
diff --git a/prototype_source/flight_recorder_tutorial.rst b/prototype_source/flight_recorder_tutorial.rst
@@ -202,6 +202,100 @@ Caveat: tabulate module is needed, so you might need pip install it first.
   python fr_trace.py <dump dir containing trace files> -j [--selected-ranks i j k ...] [--pg-filters tp dp]
   torchfrtrace <dump dir containing trace files> -j [--selected-ranks i j k ...] [--pg-filters 0 2]
 
+An End-to-End Example
+------------------------------------
+To demonstrate the use of Flight Recorder, we will use a small program where we induce mismatched collectives.
+In this example, ``rank0`` is programmed to do an additional collective.
+The Flight Recorder dump files are saved to the ``/tmp`` directory.
+For demonstration purposes, we named this program ``crash.py``.
+
+.. note::
+   Please note that this is a simplified example. In real-world scenarios, the process would involve more
+   complexities.
+
+.. code:: python
+  :caption: A crashing example
+
+  import torch
+  import torch.distributed as dist
+  import os
+  from datetime import timedelta
+
+  local_rank = int(os.environ["LOCAL_RANK"])
+  world_size = int(os.environ["WORLD_SIZE"])
+  assert world_size <= 8, "world size must be less than or equal to 8"
+  os.environ["TORCH_NCCL_DEBUG_INFO_TEMP_FILE"] = "/tmp/trace_"
+  os.environ["TORCH_NCCL_DUMP_ON_TIMEOUT"] = "1"
+  os.environ["TORCH_NCCL_TRACE_BUFFER_SIZE"] = "2000"
+  device = torch.device(f"cuda:{local_rank}")
+  print(f"{local_rank=} {world_size=} master addr: {os.environ['MASTER_ADDR']} master port: {os.environ['MASTER_PORT']} {device=}")
+
+  # Initialize the process group with a small timeout so that jobs fail quickly
+  dist.init_process_group("nccl", world_size=world_size, rank=local_rank, timeout=timedelta(seconds=1))
+
+  a = torch.full((3, 4), float(local_rank), device=device)
+  # Write some collectives to populate Flight Recorder data
+  for i in range(2):
+    print(f"calling allreduce on {local_rank=}")
+    f = dist.all_reduce(a)
+
+  # rank0 is doing an additional collective
+  if local_rank == 0:
+    print("rank0 is doing an allreduce on tensor b, but other ranks forgot")
+    b = torch.full((4,5), float(local_rank), device=device)
+    f = dist.all_reduce(b)
+
+  for i in range(2):
+    print(f"calling allreduce on {local_rank=}")
+    f = dist.all_reduce(a)
+
+  torch.cuda.synchronize(device=device)
+  print(f"{local_rank=} exiting")
+
+
+To run this program, use ``torchrun``:
+
+
+.. code:: python
+
+  torchrun --nnodes=1 --nproc_per_node=2 crash.py
+
+You should see two files in the ``/tmp`` directory:
+
+.. code:: bash
+
+  $ls /tmp/trace*
+  # Expected output
+  /tmp/trace_0 /tmp/trace_1
+
+Finally, to analyze these two files, we use the ``torchfrtrace`` command:
+
+.. code:: bash
+
+  torchfrtrace --prefix "trace_" /tmp/
+
+The output from the trace command is meant to be human-readable. It includes information about the
+set of collectives that caused a failure.
+The output for the command above is shown below.
+We can clearly see that rank 1 did not join the "all_reduce" collective.
+
+.. code-block:: bash
+  $torchfrtrace --prefix "trace_" /tmp/
+  Not all ranks joining collective 5 at entry 4
+  group info: 0:default_pg
+  collective: nccl:all_reduce
+  missing ranks: {1}
+  input sizes: [[3, 4]]
+  output sizes: [[3, 4]]
+  expected ranks: 2
+  collective state: scheduled
+  collective stack trace:
+    all_reduce at /home/cpio/local/pytorch/torch/distributed/distributed_c10d.py:2696
+    wrapper at /home/cpio/local/pytorch/torch/distributed/c10d_logger.py:83
+    <module> at /home/cpio/test/crash.py:44
+
+
+
 Conclusion
 ----------
 In this tutorial, we have learned about a new PyTorch diagnostic tool called Flight Recorder.
diff --git a/prototype_source/torchscript_freezing.py b/prototype_source/torchscript_freezing.py
@@ -2,7 +2,7 @@
 Model Freezing in TorchScript
 =============================
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 
 In this tutorial, we introduce the syntax for *model freezing* in TorchScript.
 Freezing is the process of inlining Pytorch module parameters and attributes
diff --git a/recipes_source/distributed_optim_torchscript.rst b/recipes_source/distributed_optim_torchscript.rst
@@ -1,7 +1,7 @@
 Distributed Optimizer with TorchScript support
 ==============================================================
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 
 In this recipe, you will learn:
 
diff --git a/recipes_source/torchscript_inference.rst b/recipes_source/torchscript_inference.rst
@@ -1,7 +1,7 @@
 TorchScript for Deployment
 ==========================
 
-.. note:: TorchScript is no longer in active development.
+.. warning:: TorchScript is no longer in active development.
 
 In this recipe, you will learn: