Skip to content

Commit 8f15929

Browse files
authored
Update docs/software/communication/nccl.md
1 parent 59f5ba2 commit 8f15929

File tree

1 file changed

+7
-0
lines changed

1 file changed

+7
-0
lines changed

docs/software/communication/nccl.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,5 +18,12 @@ export NCCL_NET_PLUGIN="ofi"
1818
This forces NCCL to use the libfabric plugin, enabling full use of the Slingshot network.
1919
Conversely, if the plugin can not be found, applications will fail to start instead of falling back to e.g. TCP, which would be significantly slower than with the plugin.
2020

21+
!!! warning "GPU-aware MPI with NCCL"
22+
Using GPU-aware MPI together with NCCL [can easily lead to deadlocks](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html#inter-gpu-communication-with-cuda-aware-mpi).
23+
Unless care is taken to ensure that the two methods of communication are not used concurrently, we recommend not using GPU-aware MPI with NCCL.
24+
To explicitly disable GPU-aware MPI with Cray MPICH, explicitly set `MPICH_GPU_SUPPORT_ENABLED=0`.
25+
Note that this option may be set to `1` by default on some Alps clusters.
26+
See [the Cray MPICH documentation][ref-communication-cray-mpich] for more details on GPU-aware MPI with Cray MPICH.
27+
2128
!!! todo
2229
More options?

0 commit comments

Comments
 (0)