Skip to content

Commit 8819481

Browse files
authored
Merge branch 'main' into master
2 parents 29d6ee5 + f2d5a6a commit 8819481

11 files changed

+108
-11
lines changed

_static/css/custom2.css

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,8 +67,11 @@ input[type="radio"] {
6767
.gsc-control-cse {
6868
padding: 0 !important;
6969
border-radius: 0px !important;
70-
border: none !important;;
71-
overflow: hidden;
70+
border: none !important;
71+
}
72+
73+
.gsc-overflow-hidden {
74+
overflow: visible !important;
7275
}
7376

7477
#___gcse_0 {

advanced_source/cpp_export.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Loading a TorchScript Model in C++
22
=====================================
33

4-
.. note:: TorchScript is no longer in active development.
4+
.. warning:: TorchScript is no longer in active development.
55

66
As its name suggests, the primary interface to PyTorch is the Python
77
programming language. While Python is a suitable and preferred language for

advanced_source/torch-script-parallelism.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Dynamic Parallelism in TorchScript
22
==================================
33

4-
.. note:: TorchScript is no longer in active development.
4+
.. warning:: TorchScript is no longer in active development.
55

66
In this tutorial, we introduce the syntax for doing *dynamic inter-op parallelism*
77
in TorchScript. This parallelism has the following properties:

advanced_source/torch_script_custom_classes.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Extending TorchScript with Custom C++ Classes
22
===============================================
33

4-
.. note:: TorchScript is no longer in active development.
4+
.. warning:: TorchScript is no longer in active development.
55

66
This tutorial is a follow-on to the
77
:doc:`custom operator <torch_script_custom_ops>`

beginner_source/Intro_to_TorchScript_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
55
**Authors:** James Reed ([email protected]), Michael Suo ([email protected]), rev2
66
7-
.. note:: TorchScript is no longer in active development.
7+
.. warning:: TorchScript is no longer in active development.
88
99
This tutorial is an introduction to TorchScript, an intermediate
1010
representation of a PyTorch model (subclass of ``nn.Module``) that

beginner_source/chatbot_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1128,7 +1128,7 @@ def forward(self, input_seq, input_length, max_length):
11281128
# Forward input through encoder model
11291129
encoder_outputs, encoder_hidden = self.encoder(input_seq, input_length)
11301130
# Prepare encoder's final hidden layer to be first hidden input to the decoder
1131-
decoder_hidden = encoder_hidden[:decoder.n_layers]
1131+
decoder_hidden = encoder_hidden[:self.decoder.n_layers]
11321132
# Initialize decoder input with SOS_token
11331133
decoder_input = torch.ones(1, 1, device=device, dtype=torch.long) * SOS_token
11341134
# Initialize tensors to append decoded words to

beginner_source/deploy_seq2seq_hybrid_frontend_tutorial.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
==================================================
55
**Author:** `Matthew Inkawhich <https://github.com/MatthewInkawhich>`_
66
7-
.. note:: TorchScript is no longer in active development.
7+
.. warning:: TorchScript is no longer in active development.
88
"""
99

1010

prototype_source/flight_recorder_tutorial.rst

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -202,6 +202,100 @@ Caveat: tabulate module is needed, so you might need pip install it first.
202202
python fr_trace.py <dump dir containing trace files> -j [--selected-ranks i j k ...] [--pg-filters tp dp]
203203
torchfrtrace <dump dir containing trace files> -j [--selected-ranks i j k ...] [--pg-filters 0 2]
204204
205+
An End-to-End Example
206+
------------------------------------
207+
To demonstrate the use of Flight Recorder, we will use a small program where we induce mismatched collectives.
208+
In this example, ``rank0`` is programmed to do an additional collective.
209+
The Flight Recorder dump files are saved to the ``/tmp`` directory.
210+
For demonstration purposes, we named this program ``crash.py``.
211+
212+
.. note::
213+
Please note that this is a simplified example. In real-world scenarios, the process would involve more
214+
complexities.
215+
216+
.. code:: python
217+
:caption: A crashing example
218+
219+
import torch
220+
import torch.distributed as dist
221+
import os
222+
from datetime import timedelta
223+
224+
local_rank = int(os.environ["LOCAL_RANK"])
225+
world_size = int(os.environ["WORLD_SIZE"])
226+
assert world_size <= 8, "world size must be less than or equal to 8"
227+
os.environ["TORCH_NCCL_DEBUG_INFO_TEMP_FILE"] = "/tmp/trace_"
228+
os.environ["TORCH_NCCL_DUMP_ON_TIMEOUT"] = "1"
229+
os.environ["TORCH_NCCL_TRACE_BUFFER_SIZE"] = "2000"
230+
device = torch.device(f"cuda:{local_rank}")
231+
print(f"{local_rank=} {world_size=} master addr: {os.environ['MASTER_ADDR']} master port: {os.environ['MASTER_PORT']} {device=}")
232+
233+
# Initialize the process group with a small timeout so that jobs fail quickly
234+
dist.init_process_group("nccl", world_size=world_size, rank=local_rank, timeout=timedelta(seconds=1))
235+
236+
a = torch.full((3, 4), float(local_rank), device=device)
237+
# Write some collectives to populate Flight Recorder data
238+
for i in range(2):
239+
print(f"calling allreduce on {local_rank=}")
240+
f = dist.all_reduce(a)
241+
242+
# rank0 is doing an additional collective
243+
if local_rank == 0:
244+
print("rank0 is doing an allreduce on tensor b, but other ranks forgot")
245+
b = torch.full((4,5), float(local_rank), device=device)
246+
f = dist.all_reduce(b)
247+
248+
for i in range(2):
249+
print(f"calling allreduce on {local_rank=}")
250+
f = dist.all_reduce(a)
251+
252+
torch.cuda.synchronize(device=device)
253+
print(f"{local_rank=} exiting")
254+
255+
256+
To run this program, use ``torchrun``:
257+
258+
259+
.. code:: python
260+
261+
torchrun --nnodes=1 --nproc_per_node=2 crash.py
262+
263+
You should see two files in the ``/tmp`` directory:
264+
265+
.. code:: bash
266+
267+
$ls /tmp/trace*
268+
# Expected output
269+
/tmp/trace_0 /tmp/trace_1
270+
271+
Finally, to analyze these two files, we use the ``torchfrtrace`` command:
272+
273+
.. code:: bash
274+
275+
torchfrtrace --prefix "trace_" /tmp/
276+
277+
The output from the trace command is meant to be human-readable. It includes information about the
278+
set of collectives that caused a failure.
279+
The output for the command above is shown below.
280+
We can clearly see that rank 1 did not join the "all_reduce" collective.
281+
282+
.. code-block:: bash
283+
$torchfrtrace --prefix "trace_" /tmp/
284+
Not all ranks joining collective 5 at entry 4
285+
group info: 0:default_pg
286+
collective: nccl:all_reduce
287+
missing ranks: {1}
288+
input sizes: [[3, 4]]
289+
output sizes: [[3, 4]]
290+
expected ranks: 2
291+
collective state: scheduled
292+
collective stack trace:
293+
all_reduce at /home/cpio/local/pytorch/torch/distributed/distributed_c10d.py:2696
294+
wrapper at /home/cpio/local/pytorch/torch/distributed/c10d_logger.py:83
295+
<module> at /home/cpio/test/crash.py:44
296+
297+
298+
205299
Conclusion
206300
----------
207301
In this tutorial, we have learned about a new PyTorch diagnostic tool called Flight Recorder.

prototype_source/torchscript_freezing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
Model Freezing in TorchScript
33
=============================
44
5-
.. note:: TorchScript is no longer in active development.
5+
.. warning:: TorchScript is no longer in active development.
66
77
In this tutorial, we introduce the syntax for *model freezing* in TorchScript.
88
Freezing is the process of inlining Pytorch module parameters and attributes

recipes_source/distributed_optim_torchscript.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Distributed Optimizer with TorchScript support
22
==============================================================
33

4-
.. note:: TorchScript is no longer in active development.
4+
.. warning:: TorchScript is no longer in active development.
55

66
In this recipe, you will learn:
77

0 commit comments

Comments
 (0)