Skip to content

Commit 0cd764b

Browse files
committed
Update hooks
1 parent 3adeba2 commit 0cd764b

File tree

1 file changed

+4
-3
lines changed

1 file changed

+4
-3
lines changed

docs/software/container-engine.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -529,10 +529,11 @@ At the moment of writing, 4 plugin variants are configured: `cuda11`, `cuda12`
529529
com.hooks.aws_ofi_nccl.variant = "cuda11"
530530
```
531531

532-
The AWS OFI NCCL hook also takes care of the following aspects:
532+
!!! tip
533+
It implicitly enables the [CXI hook][ref-ce-cxi-hook], therefore exposing the Slingshot interconnect to container applications. In other words, when enabling the AWS OFI NCCL hook, it's unnecessary to also enable the CXI hook separately in the EDF.
533534

534-
* It implicitly enables the [CXI hook][ref-ce-cxi-hook], therefore exposing the Slingshot interconnect to container applications. In other words, when enabling the AWS OFI NCCL hook, it's unnecessary to also enable the CXI hook separately in the EDF.
535-
* It sets environment variables to control the behavior of NCCL and the libfabric CXI provider for Slingshot. In particular, the `NCCL_NET_PLUGIN` variable ([link](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-net-plugin)) is set to force NCCL to load the specific network plugin mounted by the hook. This is useful because certain container images (for example, those from NGC repositories) might already ship with a default NCCL plugin. Other environment variables help prevent application stalls and improve performance when using GPUDirect for RDMA communication.
535+
!!! note
536+
It sets environment variables to control the behavior of NCCL and the libfabric CXI provider for Slingshot. In particular, the `NCCL_NET_PLUGIN` variable ([link](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-net-plugin)) is set to force NCCL to load the specific network plugin mounted by the hook. This is useful because certain container images (for example, those from NGC repositories) might already ship with a default NCCL plugin. Other environment variables help prevent application stalls and improve performance when using GPUDirect for RDMA communication.
536537

537538
[](){#ref-ce-ssh-hook}
538539
### SSH Hook

0 commit comments

Comments
 (0)