@@ -470,9 +470,10 @@ Communication Backends
470
470
471
471
One of the most elegant aspects of ``torch.distributed `` is its ability
472
472
to abstract and build on top of different backends. As mentioned before,
473
- there are multiple backends implemented in PyTorch.
474
- Some of the most popular ones are Gloo, NCCL, and MPI.
475
- They each have different specifications and tradeoffs, depending
473
+ there are multiple backends implemented in PyTorch. These backends can be easily selected
474
+ using the `Accelerator API <https://pytorch.org/docs/stable/torch.html#accelerators >`__,
475
+ which provides a interface for working with different accelerator types.
476
+ Some of the most popular backends are Gloo, NCCL, and MPI. They each have different specifications and tradeoffs, depending
476
477
on the desired use case. A comparative table of supported functions can
477
478
be found
478
479
`here <https://pytorch.org/docs/stable/distributed.html#module-torch.distributed >`__.
@@ -492,12 +493,13 @@ distributed SGD example does not work if you put ``model`` on the GPU.
492
493
In order to use multiple GPUs, let us also make the following
493
494
modifications:
494
495
495
- 1. Use ``device = torch.device("cuda:{}".format(rank)) ``
496
- 2. ``model = Net() `` :math: `\rightarrow ` ``model = Net().to(device) ``
497
- 3. Use ``data, target = data.to(device), target.to(device) ``
496
+ 1. Use Accelerator API ``device_type = torch.accelerator.current_accelerator() ``
497
+ 2. Use ``torch.device(f"{device_type}:{rank}") ``
498
+ 3. ``model = Net() `` :math: `\rightarrow ` ``model = Net().to(device) ``
499
+ 4. Use ``data, target = data.to(device), target.to(device) ``
498
500
499
- With the above modifications, our model is now training on two GPUs and
500
- you can monitor their utilization with ``watch nvidia-smi ``.
501
+ With these modifications, your model will now train across two GPUs.
502
+ You can monitor GPU utilization using ``watch nvidia-smi `` if you are running on NVIDIA hardware .
501
503
502
504
**MPI Backend **
503
505
@@ -553,6 +555,7 @@ more <https://www.open-mpi.org/faq/?category=running#mpirun-hostfile>`__)
553
555
Doing so, you should obtain the same familiar output as with the other
554
556
communication backends.
555
557
558
+
556
559
**NCCL Backend **
557
560
558
561
The `NCCL backend <https://github.com/nvidia/nccl >`__ provides an
@@ -561,6 +564,14 @@ tensors. If you only use CUDA tensors for your collective operations,
561
564
consider using this backend for the best in class performance. The
562
565
NCCL backend is included in the pre-built binaries with CUDA support.
563
566
567
+ **XCCL Backend **
568
+
569
+ The `XCCL backend ` offers an optimized implementation of collective operations for XPU tensors.
570
+ If your workload uses only XPU tensors for collective operations,
571
+ this backend provides best-in-class performance.
572
+ The XCCL backend is included in the pre-built binaries with XPU support.
573
+
574
+
564
575
Initialization Methods
565
576
~~~~~~~~~~~~~~~~~~~~~~
566
577
0 commit comments