You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/tools/slurm.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ The following sections will provide detailed guidance on how to use SLURM to req
23
23
The [GH200 nodes on Alps][gh200-node] have four GPUs per node, and SLURM job submissions must be configured appropriately to best make use of the resources.
24
24
Applications that can saturate the GPUs with a single process per GPU should generally prefer this mode.
25
25
[Configuring SLURM jobs to use a single GPU per rank][gh200-slurm-single-rank-per-gpu] is also the most straightforward setup.
26
-
Some applications perform badly with a single rank per GPU, and require use of [NVIDIA's Multi-Process-Service (MPS)](https://docs.nvidia.com/deploy/mps/index.html) to oversubscribe GPUs with multiple ranks per GPU.
26
+
Some applications perform badly with a single rank per GPU, and require use of [NVIDIA's Multi-ProcessService (MPS)] to oversubscribe GPUs with multiple ranks per GPU.
27
27
28
28
The best SLURM configuration is application- and workload-specific, so it is worth testing which works best in your particular case.
29
29
See [Scientific Applications][sciapps] for information about recommended application-specific SLURM configurations.
@@ -62,7 +62,7 @@ Omitting the `--gpus-per-task` flag will lead to all ranks on the node using the
62
62
63
63
Using multiple ranks per GPU can improve performance e.g. of applications that don't generate enough work for a GPU using a single rank, or ones that scale badly to all 72 cores of the Grace CPU.
64
64
In these cases SLURM jobs must be configured to assign multiple ranks to a single GPU.
65
-
This is best done using [MPS](https://docs.nvidia.com/deploy/mps/index.html).
65
+
This is best done using [NVIDIA's Multi-Process Service (MPS)].
66
66
To use MPS, launch your application using the following wrapper script, which will start MPS on one rank per node and assign GPUs to ranks according to the CPU mask of a rank, ensuring the closest GPU is used:
67
67
68
68
```bash
@@ -123,6 +123,8 @@ Note that in the example job above:
123
123
124
124
The configuration that is optimal for your application may be different.
125
125
126
+
[NVIDIA's Multi-Process Service (MPS)]: https://docs.nvidia.com/deploy/mps/index.html
0 commit comments