Skip to content

Commit 6464f38

Browse files
authored
Add documentation for Slurm THP and vboost features (#286)
Currently I haven't linked to the features from other pages, but would happy to hear if you have suggestions. @lukasgd @boeschf @henrique do you think it makes sense to add a link and/or the `--constraint nvidia_vboost_enable` flag to ~any of the ML ~tutorials~ pages?
1 parent dd1b526 commit 6464f38

File tree

4 files changed

+90
-2
lines changed

4 files changed

+90
-2
lines changed

.github/actions/spelling/allow.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ SSHService
106106
STMV
107107
Scopi
108108
Signalkuppe
109+
THP
109110
TOTP
110111
UANs
111112
UIs
@@ -174,6 +175,8 @@ gromos
174175
groundstate
175176
gsl
176177
hdf
178+
hugepages
179+
hugetlbfs
177180
hotmail
178181
huggingface
179182
hwloc
@@ -194,6 +197,7 @@ lapackpp
194197
lexer
195198
lexers
196199
libfabric
200+
libhugetlbfs
197201
libint
198202
libtree
199203
libxc

docs/clusters/daint.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -151,7 +151,7 @@ Daint can also be accessed using [FirecREST][ref-firecrest] at the `https://api.
151151
The [access-counter-based memory migration feature](https://developer.nvidia.com/blog/cuda-toolkit-12-4-enhances-support-for-nvidia-grace-hopper-and-confidential-computing/#access-counter-based_migration_for_nvidia_grace_hopper_memory) in the NVIDIA driver for Grace Hopper is disabled to address performance issues affecting NCCL-based workloads (e.g. LLM training)
152152

153153
??? note "NVIDIA boost slider"
154-
Added an option to enable the NVIDIA boost slider (vboost) via Slurm using the `-C nvidia_vboost_enabled` flag.
154+
Added [an option to enable the NVIDIA boost slider (vboost)][ref-slurm-features-vboost] via Slurm using the `-C nvidia_vboost_enabled` flag.
155155
This feature, disabled by default, may increase GPU frequency and performance while staying within the power budget
156156

157157
??? note "Enroot update"

docs/running/slurm.md

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -237,6 +237,90 @@ The build generates the following executables:
237237

238238
You can also check GPU affinity by inspecting the value of the `CUDA_VISIBLE_DEVICES` environment variable.
239239

240+
[](){#ref-slurm-features}
241+
## Slurm features
242+
243+
Slurm allows specifying [constraints](https://slurm.schedmd.com/sbatch.html#OPT_constraint) for jobs, which can be used to change features available on nodes in a job.
244+
CSCS implements a few custom features, described below, that can be selected on certain clusters.
245+
To check which features are available on a cluster, for example on the `normal` partition, use `sinfo`:
246+
247+
```console
248+
$ sinfo --partition normal --format %b
249+
ACTIVE_FEATURES
250+
gh,gpu,thp_never,thp_always,thp_madvise,nvidia_vboost_enabled,nvidia_vboost_disabled
251+
```
252+
253+
One or more constraints can be selected using the `--constraint`/`-C` flag of `sbatch` or `srun`:
254+
255+
```bash
256+
sbatch --constraint thp_never&nvidia_vboost_enabled batch.sh
257+
```
258+
259+
[](){#ref-slurm-features-thp}
260+
### Transparent hugepages
261+
262+
!!! info "The THP Slurm feature is only available on [GH200 nodes][ref-alps-gh200-node]"
263+
264+
[Transparent hugepages (THP)](https://www.kernel.org/doc/html/v6.17/admin-guide/mm/transhuge.html) are a Linux kernel feature that allows automatically coalescing pages into huge pages without the user application explicitly asking for hugepages:
265+
266+
> Performance critical computing applications dealing with large memory working sets are already running on top of libhugetlbfs and in turn hugetlbfs.
267+
> Transparent HugePage Support (THP) is an alternative mean of using huge pages for the backing of virtual memory with huge pages that supports the automatic promotion and demotion of page sizes and without the shortcomings of hugetlbfs.
268+
269+
While this feature generally improves performance, we have observed degrading application performance with the THP feature enabled due to the page coalescing blocking progress on certain operations.
270+
An example of this is ICON, a latency-sensitive application where small delays can can cause large performance drops.
271+
272+
THP support is enabled by default, and the current setting can be checked with:
273+
274+
```console
275+
$ cat /sys/kernel/mm/transparent_hugepage/enabled
276+
[always] madvise never
277+
```
278+
279+
A detailed explanation of how the different options behave can be found in the [THP documentation](https://www.kernel.org/doc/html/v6.17/admin-guide/mm/transhuge.html#global-thp-controls).
280+
281+
The available Slurm features to select the THP mode are listed below:
282+
283+
| Kernel setting | Slurm constraint |
284+
|----------------|------------------------|
285+
| `always` | `thp_always` (default) |
286+
| `madvise` | `thp_madvise` |
287+
| `never` | `thp_never` |
288+
289+
[](){#ref-slurm-features-vboost}
290+
### NVIDIA vboost
291+
292+
!!! info "The NVIDIA vboost Slurm feature is only available on [GH200 nodes][ref-alps-gh200-node]"
293+
294+
The [NVIDIA NeMo documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance-guide.html#gpu-core-clock-optimization) describes the vboost feature as:
295+
296+
> NVIDIA GPUs support a CPU core clock boost mode, which increases the core clock rate by reducing the off-chip memory clock rate.
297+
> This is particularly beneficial for LLMs, which are typically compute throughput-bound.
298+
299+
The vboost slider is at `0` by default, and the current value can be checked checked with `nvidia-smi`:
300+
301+
```console
302+
$ nvidia-smi boost-slider --list
303+
+-------------------------------------------------+
304+
| GPU Boost Slider |
305+
| GPU Slider Max Value Current Value |
306+
|=================================================|
307+
| 0 vboost 4 0 |
308+
+-------------------------------------------------+
309+
| 1 vboost 4 0 |
310+
+-------------------------------------------------+
311+
| 2 vboost 4 0 |
312+
+-------------------------------------------------+
313+
| 3 vboost 4 0 |
314+
+-------------------------------------------------+
315+
```
316+
317+
The slider can be set to `1` using the `nvidia_vboost_enable` feature:
318+
319+
| vboost setting | Slurm constraint |
320+
|----------------|-----------------------------------|
321+
| `0` | `nvidia_vboost_disable` (default) |
322+
| `1` | `nvidia_vboost_enable` |
323+
240324
[](){#ref-slurm-gh200}
241325
## NVIDIA GH200 GPU Nodes
242326

docs/software/ml/pytorch.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -185,7 +185,7 @@ For further details on execution logic, job monitoring and data management, plea
185185

186186
* Extensively evaluate all possible parallelization dimensions, including data-, tensor- and pipeline parallelism (including virtual pipeline parallelism) and more, when available. Identify storage-related bottlenecks by isolating data loading/generation operations into a separate benchmark.
187187

188-
* Disabling transparent huge pages and enabling the Nvidia [vboost](https://docs.nvidia.com/nemo-framework/user-guide/latest/performance/performance-guide.html#gpu-core-clock-optimization) feature has been observed to improve performance in large-scale LLM training in Megatron-LM. This can be achieved by adding these constraints to the sbatch script:
188+
* [Disabling transparent huge pages][ref-slurm-features-thp] and [enabling the Nvidia vboost feature][ref-slurm-features-vboost] has been observed to improve performance in large-scale LLM training in Megatron-LM. This can be achieved by adding these constraints to the sbatch script:
189189
```bash
190190
#SBATCH -C thp_never&nvidia_vboost_enabled
191191
```

0 commit comments

Comments
 (0)