You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In principle, the `--environment` option can also be used within batch scripts as an `#SBATCH` option.
136
-
It is important to note that in such a case, all the contents of the script are executed within the containerized environment: the CE toolset gives access to the Slurm workload manager within containers via the Slurm hook, see section [Container Hooks](#container-hooks) (controlled by the `ENROOT_SLURM_HOOK` environment variable and activated by default on most vClusters). Only with it, calls to Slurm commands (for example `srun` or `scontrol`) within the batch script will work.
136
+
It is important to note that in such a case, all the contents of the script are executed within the containerized environment: the CE toolset gives access to the Slurm workload manager within containers via the Slurm hook, see section [Container Hooks][ref-ce-container-hooks] (controlled by the `ENROOT_SLURM_HOOK` environment variable and activated by default on most vClusters). Only with it, calls to Slurm commands (for example `srun` or `scontrol`) within the batch script will work.
137
137
138
138
139
139
!!! tip
@@ -429,8 +429,9 @@ The Container Engine provides a hook to allow containers relying on [libfabric]
429
429
The hook leverages bind-mounting the custom host libfabric library into the container (in addition to all the required dependency libraries and devices as well).
430
430
If a libfabric library is already present in the container filesystem (for example, it's provided by the image), it is replaced with its host counterpart, otherwise the host libfabric is just added to the container.
431
431
432
-
> **NOTE**: Due to the nature of Slingshot and the mechanism implemented by the CXI hook, container applications need to use a communication library which supports libfabric in order to benefit from usage of the hook.
433
-
> Libfabric support might have to be defined at compilation time (as is the case for some MPI implementations, like MPICH and OpenMPI) or could be dynamically available at runtime (as is the case with NCCL - see also [this](#aws-ofi-hook) section for more details).
432
+
!!! note
433
+
Due to the nature of Slingshot and the mechanism implemented by the CXI hook, container applications need to use a communication library which supports libfabric in order to benefit from usage of the hook.
434
+
> Libfabric support might have to be defined at compilation time (as is the case for some MPI implementations, like MPICH and OpenMPI) or could be dynamically available at runtime (as is the case with NCCL - see also [this][ref-ce-aws-ofi-hook] section for more details).
434
435
435
436
The hook is activated by setting the `com.hooks.cxi.enabled` annotation, which can be defined in the EDF, as shown in the following example:
> **TIP**: On several vClusters, the CXI hook for Slingshot connectivity is enabled implicitly by default or by other hooks. Therefore, entering the enabling annotation in the EDF is unnecessary in many cases.
509
+
!!! tip
510
+
On several vClusters, the CXI hook for Slingshot connectivity is enabled implicitly by default or by other hooks.
511
+
Therefore, entering the enabling annotation in the EDF is unnecessary in many cases.
509
512
510
-
## <a name="container-hooks"></a> Container Hooks
513
+
[](){#ref-ce-container-hooks}
514
+
## Container Hooks
511
515
512
516
Container hooks let you customize container behavior to fit system-specific needs, making them especially valuable for High-Performance Computing.
513
517
514
518
* *What they do*: Hooks extend container runtime functionality by enabling custom actions during a container's lifecycle.
515
519
**Use for HPC*: HPC systems rely on specialized hardware and fine-tuned software, unlike generic containers. Hooks bridge this gap by allowing containers to access these system-specific resources or enable custom features.
516
520
517
-
>**INFO**: This section outlines all hooks supported in production by the Container Engine. However, specific Alps vClusters may support only a subset or use custom configurations. For details about available features in individual vClusters, consult platform documentation or contact CSCS support.
521
+
!!! info
522
+
This section outlines all hooks supported in production by the Container Engine.
523
+
However, specific Alps vClusters may support only a subset or use custom configurations.
524
+
For details about available features in individual vClusters, consult platform documentation or contact CSCS support.
518
525
519
-
### <a name="aws-ofi-hook"></a> AWS OFI NCCL Hook
526
+
[](){#ref-ce-aws-ofi-hook}
527
+
### AWS OFI NCCL Hook
520
528
521
529
The [AWS OFI NCCL plugin](https://github.com/aws/aws-ofi-nccl) is a software extension that allows the [NCCL](https://developer.nvidia.com/nccl) and [RCCL](https://rocm.docs.amd.com/projects/rccl/en/latest/) libraries to use libfabric as a network provider and, through libfabric, to access the Slingshot high-speed interconnect.
>**INFO**: When using the NVIDIA CUDA MPS hook it is not necessary to use other wrappers or scripts to manage the Multi-Process Service, as is documented for native jobs on some vClusters.
642
+
!!! info
643
+
When using the NVIDIA CUDA MPS hook it is not necessary to use other wrappers or scripts to manage the Multi-Process Service, as is documented for native jobs on some vClusters.
0 commit comments