Add documentation for Slurm THP and vboost features (#286)

msimberg · web-flow · commit 6464f38c322a · 2025-10-22T07:41:21.000Z
Currently I haven't linked to the features from other pages, but would happy to hear if you have suggestions. @lukasgd @boeschf @henrique do you think it makes sense to add a link and/or the `--constraint nvidia_vboost_enable` flag to ~any of the ML ~tutorials~ pages?
diff --git a/.github/actions/spelling/allow.txt b/.github/actions/spelling/allow.txt
@@ -106,6 +106,7 @@ SSHService
 STMV
 Scopi
 Signalkuppe
+THP
 TOTP
 UANs
 UIs
@@ -174,6 +175,8 @@ gromos
 groundstate
 gsl
 hdf
+hugepages
+hugetlbfs
 hotmail
 huggingface
 hwloc
@@ -194,6 +197,7 @@ lapackpp
 lexer
 lexers
 libfabric
+libhugetlbfs
 libint
 libtree
 libxc
diff --git a/docs/clusters/daint.md b/docs/clusters/daint.md
@@ -151,7 +151,7 @@ Daint can also be accessed using [FirecREST][ref-firecrest] at the `https://api.
         The [access-counter-based memory migration feature](https://developer.nvidia.com/blog/cuda-toolkit-12-4-enhances-support-for-nvidia-grace-hopper-and-confidential-computing/#access-counter-based_migration_for_nvidia_grace_hopper_memory) in the NVIDIA driver for Grace Hopper is disabled to address performance issues affecting NCCL-based workloads (e.g. LLM training)
 
     ??? note "NVIDIA boost slider"
-        Added an option to enable the NVIDIA boost slider (vboost) via Slurm using the `-C nvidia_vboost_enabled` flag.
+        Added [an option to enable the NVIDIA boost slider (vboost)][ref-slurm-features-vboost] via Slurm using the `-C nvidia_vboost_enabled` flag.
         This feature, disabled by default, may increase GPU frequency and performance while staying within the power budget
 
     ??? note "Enroot update"
diff --git a/docs/running/slurm.md b/docs/running/slurm.md
@@ -237,6 +237,90 @@ The build generates the following executables:
 
     You can also check GPU affinity by inspecting the value of the `CUDA_VISIBLE_DEVICES` environment variable.
 
+[](){#ref-slurm-features}
+## Slurm features
+
+Slurm allows specifying [constraints](https://slurm.schedmd.com/sbatch.html#OPT_constraint) for jobs, which can be used to change features available on nodes in a job.
+CSCS implements a few custom features, described below, that can be selected on certain clusters.
+To check which features are available on a cluster, for example on the `normal` partition, use `sinfo`:
+
+```console
+$ sinfo --partition normal --format %b
+ACTIVE_FEATURES
+gh,gpu,thp_never,thp_always,thp_madvise,nvidia_vboost_enabled,nvidia_vboost_disabled
+```
+
+One or more constraints can be selected using the `--constraint`/`-C` flag of `sbatch` or `srun`:
+
+```bash
+sbatch --constraint thp_never&nvidia_vboost_enabled batch.sh
+```
+
+[](){#ref-slurm-features-thp}
+### Transparent hugepages
+
+!!! info "The THP Slurm feature is only available on [GH200 nodes][ref-alps-gh200-node]"
+
+[Transparent hugepages (THP)](https://www.kernel.org/doc/html/v6.17/admin-guide/mm/transhuge.html) are a Linux kernel feature that allows automatically coalescing pages into huge pages without the user application explicitly asking for hugepages:
+
+> Performance critical computing applications dealing with large memory working sets are already running on top of libhugetlbfs and in turn hugetlbfs.
+> Transparent HugePage Support (THP) is an alternative mean of using huge pages for the backing of virtual memory with huge pages that supports the automatic promotion and demotion of page sizes and without the shortcomings of hugetlbfs.
+
+While this feature generally improves performance, we have observed degrading application performance with the THP feature enabled due to the page coalescing blocking progress on certain operations.
+An example of this is ICON, a latency-sensitive application where small delays can can cause large performance drops.
+
+THP support is enabled by default, and the current setting can be checked with:
+
+```console
+$ cat /sys/kernel/mm/transparent_hugepage/enabled
+[always] madvise never
+```
+
+A detailed explanation of how the different options behave can be found in the [THP documentation](https://www.kernel.org/doc/html/v6.17/admin-guide/mm/transhuge.html#global-thp-controls).
+
+The available Slurm features to select the THP mode are listed below:
+
+| Kernel setting | Slurm constraint       |
+|----------------|------------------------|
+| `always`       | `thp_always` (default) |
+| `madvise`      | `thp_madvise`          |
+| `never`        | `thp_never`            |
+
+[](){#ref-slurm-features-vboost}
+### NVIDIA vboost
+
+!!! info "The NVIDIA vboost Slurm feature is only available on [GH200 nodes][ref-alps-gh200-node]"
+
+The [NVIDIA NeMo documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance-guide.html#gpu-core-clock-optimization) describes the vboost feature as:
+
+> NVIDIA GPUs support a CPU core clock boost mode, which increases the core clock rate by reducing the off-chip memory clock rate.
+> This is particularly beneficial for LLMs, which are typically compute throughput-bound.
+
+The vboost slider is at `0` by default, and the current value can be checked checked with `nvidia-smi`:
+
+```console
+$ nvidia-smi boost-slider --list
++-------------------------------------------------+
+| GPU Boost Slider                                |
+| GPU     Slider       Max Value    Current Value |
+|=================================================|
+|   0     vboost           4              0       |
++-------------------------------------------------+
+|   1     vboost           4              0       |
++-------------------------------------------------+
+|   2     vboost           4              0       |
++-------------------------------------------------+
+|   3     vboost           4              0       |
++-------------------------------------------------+
+```
+
+The slider can be set to `1` using the `nvidia_vboost_enable` feature:
+
+| vboost setting | Slurm constraint                  |
+|----------------|-----------------------------------|
+| `0`            | `nvidia_vboost_disable` (default) |
+| `1`            | `nvidia_vboost_enable`            |
+
 [](){#ref-slurm-gh200}
 ## NVIDIA GH200 GPU Nodes
 
diff --git a/docs/software/ml/pytorch.md b/docs/software/ml/pytorch.md
@@ -185,7 +185,7 @@ For further details on execution logic, job monitoring and data management, plea
 
     * Extensively evaluate all possible parallelization dimensions, including data-, tensor- and pipeline parallelism (including virtual pipeline parallelism) and more, when available. Identify storage-related bottlenecks by isolating data loading/generation operations into a separate benchmark.
 
-    * Disabling transparent huge pages and enabling the Nvidia [vboost](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance-guide.html#gpu-core-clock-optimization) feature has been observed to improve performance in large-scale LLM training in Megatron-LM. This can be achieved by adding these constraints to the sbatch script:
+    * [Disabling transparent huge pages][ref-slurm-features-thp] and [enabling the Nvidia vboost feature][ref-slurm-features-vboost] has been observed to improve performance in large-scale LLM training in Megatron-LM. This can be achieved by adding these constraints to the sbatch script:
        ```bash
        #SBATCH -C thp_never&nvidia_vboost_enabled
        ```