You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/software/container-engine/resource-hook.md
+8-8Lines changed: 8 additions & 8 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -33,19 +33,19 @@ This can be done in multiple ways in TOML: for example, both of the following us
33
33
34
34
* Attributes can be added to a table only in one place in the TOML file. In other words, each table must be defined in a single square bracket section. For example, in the invalid example below, the `ssh` table was doubly defined both in the `[annotations]` and in the `[annotations.com.hooks.ssh]` sections. See the [TOML format](https://toml.io/en/) spec for more details.
@@ -69,7 +69,7 @@ Container hooks let you customize container behavior to fit system-specific need
69
69
[](){#ref-ce-cxi-hook}
70
70
### HPE Slingshot interconnect
71
71
72
-
```bash
72
+
```toml
73
73
com.hooks.cxi.enabled = "true"
74
74
```
75
75
@@ -167,7 +167,7 @@ The hook is activated by setting the `com.hooks.cxi.enabled` annotation, which
167
167
[](){#ref-ce-aws-ofi-hook}
168
168
### AWS OFI NCCL Hook
169
169
170
-
```bash
170
+
```toml
171
171
com.hooks.aws_ofi_nccl.enabled = "true"
172
172
com.hooks.aws_ofi_nccl.variant = "cuda12"# (1)
173
173
```
@@ -187,7 +187,7 @@ At the moment of writing, 4 plugin variants are configured: `cuda11`, `cuda12`
187
187
It sets environment variables to control the behavior of NCCL and the libfabric CXI provider for Slingshot. In particular, the `NCCL_NET_PLUGIN` variable ([link](https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/env.html#nccl-net-plugin)) is set to force NCCL to load the specific network plugin mounted by the hook. This is useful because certain container images (for example, those from NGC repositories) might already ship with a default NCCL plugin. Other environment variables help prevent application stalls and improve performance when using GPUDirect for RDMA communication.
188
188
189
189
!!! example "EDF for the NGC PyTorch 22.12 image with Cuda 11"
0 commit comments