Skip to content

Commit e59b741

Browse files
committed
Enhance distributed tutorial: integrate Accelerator API for backend selection and add XCCL backend details
Signed-off-by: jafraustro <[email protected]>
1 parent d54656f commit e59b741

File tree

1 file changed

+19
-8
lines changed

1 file changed

+19
-8
lines changed

intermediate_source/dist_tuto.rst

Lines changed: 19 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -470,9 +470,10 @@ Communication Backends
470470

471471
One of the most elegant aspects of ``torch.distributed`` is its ability
472472
to abstract and build on top of different backends. As mentioned before,
473-
there are multiple backends implemented in PyTorch.
474-
Some of the most popular ones are Gloo, NCCL, and MPI.
475-
They each have different specifications and tradeoffs, depending
473+
there are multiple backends implemented in PyTorch. These backends can be easily selected
474+
using the `Accelerator API <https://pytorch.org/docs/stable/torch.html#accelerators>`__,
475+
which provides a interface for working with different accelerator types.
476+
Some of the most popular backends are Gloo, NCCL, and MPI. They each have different specifications and tradeoffs, depending
476477
on the desired use case. A comparative table of supported functions can
477478
be found
478479
`here <https://pytorch.org/docs/stable/distributed.html#module-torch.distributed>`__.
@@ -492,12 +493,13 @@ distributed SGD example does not work if you put ``model`` on the GPU.
492493
In order to use multiple GPUs, let us also make the following
493494
modifications:
494495

495-
1. Use ``device = torch.device("cuda:{}".format(rank))``
496-
2. ``model = Net()`` :math:`\rightarrow` ``model = Net().to(device)``
497-
3. Use ``data, target = data.to(device), target.to(device)``
496+
1. Use Accelerator API ``device_type = torch.accelerator.current_accelerator()``
497+
2. Use ``torch.device(f"{device_type}:{rank}")``
498+
3. ``model = Net()`` :math:`\rightarrow` ``model = Net().to(device)``
499+
4. Use ``data, target = data.to(device), target.to(device)``
498500

499-
With the above modifications, our model is now training on two GPUs and
500-
you can monitor their utilization with ``watch nvidia-smi``.
501+
With these modifications, your model will now train across two GPUs.
502+
You can monitor GPU utilization using ``watch nvidia-smi`` if you are running on NVIDIA hardware.
501503

502504
**MPI Backend**
503505

@@ -553,6 +555,7 @@ more <https://www.open-mpi.org/faq/?category=running#mpirun-hostfile>`__)
553555
Doing so, you should obtain the same familiar output as with the other
554556
communication backends.
555557

558+
556559
**NCCL Backend**
557560

558561
The `NCCL backend <https://github.com/nvidia/nccl>`__ provides an
@@ -561,6 +564,14 @@ tensors. If you only use CUDA tensors for your collective operations,
561564
consider using this backend for the best in class performance. The
562565
NCCL backend is included in the pre-built binaries with CUDA support.
563566

567+
**XCCL Backend**
568+
569+
The `XCCL backend` offers an optimized implementation of collective operations for XPU tensors.
570+
If your workload uses only XPU tensors for collective operations,
571+
this backend provides best-in-class performance.
572+
The XCCL backend is included in the pre-built binaries with XPU support.
573+
574+
564575
Initialization Methods
565576
~~~~~~~~~~~~~~~~~~~~~~
566577

0 commit comments

Comments
 (0)