Skip to content

Commit 636d4cd

Browse files
committed
document affinity program
1 parent a5e0cef commit 636d4cd

File tree

2 files changed

+124
-9
lines changed

2 files changed

+124
-9
lines changed

docs/alps/hardware.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -40,13 +40,13 @@ Alps was installed in phases, starting with the installation of 1024 AMD Rome du
4040

4141
There are currently five node types in Alps:
4242

43-
| type | abbreviation | blades | nodes | CPU sockets | GPU devices |
44-
| ---- | ------- | ------:| -----:| -----------:| -----------:|
45-
| NVIDIA GH200 | gh200 | 1344 | 2688 | 10,752 | 10,752 |
46-
| AMD Rome | zen2 | 256 | 1024 | 2,048 | -- |
47-
| NVIDIA A100 | a100 | 72 | 144 | 144 | 576 |
48-
| AMD MI250x | mi200 | 12 | 24 | 24 | 96 |
49-
| AMD MI300A | mi300 | 64 | 128 | 512 | 512 |
43+
| type | abbreviation | blades | nodes | CPU sockets | GPU devices |
44+
| ---- | ------- | ------:| -----:| -----------:| -----------:|
45+
| [NVIDIA GH200][ref-alps-gh200-node] | gh200 | 1344 | 2688 | 10,752 | 10,752 |
46+
| [AMD Rome][ref-alps-zen2-node] | zen2 | 256 | 1024 | 2,048 | -- |
47+
| [NVIDIA A100][ref-alps-a100-node] | a100 | 72 | 144 | 144 | 576 |
48+
| [AMD MI250x][ref-alps-mi200-node] | mi200 | 12 | 24 | 24 | 96 |
49+
| [AMD MI300A][ref-alps-mi300-node] | mi300 | 64 | 128 | 512 | 512 |
5050

5151
[](){#ref-alps-gh200-node}
5252
### NVIDIA GH200 GPU Nodes
@@ -81,6 +81,9 @@ Each node contains four Grace-Hopper modules and four corresponding network inte
8181
### AMD Rome CPU Nodes
8282

8383
!!! todo
84+
[confluence link 1](https://confluence.cscs.ch/spaces/KB/pages/850199545/Compute+node+configuration)
85+
86+
[confluence link 2](https://confluence.cscs.ch/spaces/KB/pages/850199543/CPU+configuration)
8487

8588
EX425
8689

docs/running/slurm.md

Lines changed: 114 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,8 @@ SLURM is an open-source, highly scalable job scheduler that allocates computing
99
!!! todo
1010
document `--account`, `--constraint` and other generic flags.
1111

12+
[Confluence link](https://confluence.cscs.ch/spaces/KB/pages/794296413/How+to+run+jobs+on+Eiger)
13+
1214
[](){#ref-slurm-partitions}
1315
## Partitions
1416

@@ -27,7 +29,6 @@ Each type of node has different resource constraints and capabilities, which SLU
2729
```
2830
The last column shows the number of nodes that have been allocated in currently running jobs (`A`) and the number of jobs that are idle (`I`).
2931

30-
3132
[](){#ref-slurm-partition-debug}
3233
### Debug partition
3334
The SLURM `debug` partition is useful for quick turnaround workflows. The partition has a short maximum time (timelimit can be seen with `sinfo -p debug`), and a low number of maximum nodes (the `MaxNodes` can be seen with `scontrol show partition=debug`).
@@ -38,6 +39,116 @@ This is the default partition, and will be used when you do not explicitly set a
3839

3940
The following sections will provide detailed guidance on how to use SLURM to request and manage CPU cores, memory, and GPUs in jobs. These instructions will help users optimize their workload execution and ensure efficient use of CSCS computing resources.
4041

42+
## Affinity
43+
44+
The following sections will document how to use Slurm on different compute nodes available on Alps.
45+
To demonstrate the effects different Slurm parameters, we will use a little command line tool [affinity](https://github.com/bcumming/affinity) that prints the CPU cores and GPUs that are assinged to each MPI rank in a job, and which node they are run on.
46+
47+
We strongly recommend using a tool like affinity to understand and test the Slurm configuration for jobs, because the behavior of Slurm is highly dependent on the system configuration.
48+
Parameters that worked on a different cluster -- or with a different Slurm version or configuration on the same cluster -- are not guaranteed to give the same results.
49+
50+
It is straightforward to build the affinity tool to experiment with Slurm configurations.
51+
52+
```console title="Compiling affinity"
53+
$ uenv start prgenv-gnu/24.11:v2 --view=default #(1)
54+
$ git clone https://github.com/bcumming/affinity.git
55+
$ cd affinity; mkdir build; cd build;
56+
$ CC=gcc CXX=g++ cmake .. #(2)
57+
$ CC=gcc CXX=g++ cmake .. -DAFFINITY_GPU=cuda #(3)
58+
$ CC=gcc CXX=g++ cmake .. -DAFFINITY_GPU=rocm #(4)
59+
```
60+
61+
1. Affinity can be built using [`prgenv-gnu`][ref-uenv-prgenv-gnu] on all clusters.
62+
63+
2. By default affinity will build with MPI support and no GPU support: configure with no additional arguments on a CPU-only system like [Eiger][ref-cluster-eiger].
64+
65+
3. Enable CUDA support on systems that provide NVIDIA GPUs.
66+
67+
4. Enable ROCM support on systems that provide AMD GPUs.
68+
69+
The build generates the following executables:
70+
71+
* `affinity.omp`: tests thread affinity with no MPI (always built).
72+
* `affinity.mpi`: tests thread affinity with MPI (built by default).
73+
* `affinity.cuda`: tests thread and GPU affinity with MPI (built with `-DAFFINITY_GPU=cuda`).
74+
* `affinity.rocm`: tests thread and GPU affinity with MPI (built with `-DAFFINITY_GPU=rocm`).
75+
76+
??? example "Testing CPU affinity"
77+
Test CPU affinity (this can be used on both CPU and GPU enabled nodes).
78+
```console
79+
$ uenv start prgenv-gnu/24.11:v2 --view=default
80+
$ srun -n8 -N2 -c72 ./affinity.mpi
81+
affinity test for 8 MPI ranks
82+
rank 0 @ nid006363: threads [ 0:71] -> cores [ 0: 71]
83+
rank 1 @ nid006363: threads [ 0:71] -> cores [ 72:143]
84+
rank 2 @ nid006363: threads [ 0:71] -> cores [144:215]
85+
rank 3 @ nid006363: threads [ 0:71] -> cores [216:287]
86+
rank 4 @ nid006375: threads [ 0:71] -> cores [ 0: 71]
87+
rank 5 @ nid006375: threads [ 0:71] -> cores [ 72:143]
88+
rank 6 @ nid006375: threads [ 0:71] -> cores [144:215]
89+
rank 7 @ nid006375: threads [ 0:71] -> cores [216:287]
90+
```
91+
92+
In this example there are 8 MPI ranks:
93+
94+
* ranks `0:3` are on node `nid006363`;
95+
* ranks `4:7` are on node `nid006375`;
96+
* each rank has 72 threads numbered `0:71`;
97+
* all threads on each rank have affinity with the same 72 cores;
98+
* each rank gets 72 cores, e.g. rank 1 gets cores `72:143` on node `nid006363`.
99+
100+
101+
102+
??? example "Testing GPU affinity"
103+
Use `affinity.cuda` or `affinity.rocm` to test on GPU-enabled systems.
104+
105+
```console
106+
$ srun -n4 -N1 ./affinity.cuda #(1)
107+
GPU affinity test for 4 MPI ranks
108+
rank 0 @ nid005555
109+
cores : [0:7]
110+
gpu 0 : GPU-2ae325c4-b542-26c2-d10f-c4d84847f461
111+
gpu 1 : GPU-5923dec6-288f-4418-f485-666b93f5f244
112+
gpu 2 : GPU-170b8198-a3e1-de6a-ff82-d440f71c05da
113+
gpu 3 : GPU-0e184efb-1d1f-f278-b96d-15bc8e5f17be
114+
rank 1 @ nid005555
115+
cores : [72:79]
116+
gpu 0 : GPU-2ae325c4-b542-26c2-d10f-c4d84847f461
117+
gpu 1 : GPU-5923dec6-288f-4418-f485-666b93f5f244
118+
gpu 2 : GPU-170b8198-a3e1-de6a-ff82-d440f71c05da
119+
gpu 3 : GPU-0e184efb-1d1f-f278-b96d-15bc8e5f17be
120+
rank 2 @ nid005555
121+
cores : [144:151]
122+
gpu 0 : GPU-2ae325c4-b542-26c2-d10f-c4d84847f461
123+
gpu 1 : GPU-5923dec6-288f-4418-f485-666b93f5f244
124+
gpu 2 : GPU-170b8198-a3e1-de6a-ff82-d440f71c05da
125+
gpu 3 : GPU-0e184efb-1d1f-f278-b96d-15bc8e5f17be
126+
rank 3 @ nid005555
127+
cores : [216:223]
128+
gpu 0 : GPU-2ae325c4-b542-26c2-d10f-c4d84847f461
129+
gpu 1 : GPU-5923dec6-288f-4418-f485-666b93f5f244
130+
gpu 2 : GPU-170b8198-a3e1-de6a-ff82-d440f71c05da
131+
gpu 3 : GPU-0e184efb-1d1f-f278-b96d-15bc8e5f17be
132+
$ srun -n4 -N1 --gpus-per-task=1 ./affinity.cuda #(2)
133+
GPU affinity test for 4 MPI ranks
134+
rank 0 @ nid005675
135+
cores : [0:7]
136+
gpu 0 : GPU-a16a8dac-7661-a44b-c6f8-f783f6e812d3
137+
rank 1 @ nid005675
138+
cores : [72:79]
139+
gpu 0 : GPU-ca5160ac-2c1e-ff6c-9cec-e7ce5c9b2d09
140+
rank 2 @ nid005675
141+
cores : [144:151]
142+
gpu 0 : GPU-496a2216-8b3c-878e-e317-36e69af11161
143+
rank 3 @ nid005675
144+
cores : [216:223]
145+
gpu 0 : GPU-766e3b8b-fa19-1480-b02f-0dfd3f2c87ff
146+
```
147+
148+
1. Test GPU affinity: note how all 4 ranks see the same 4 GPUs.
149+
150+
2. Test GPU affinity: note how the `--gpus-per-task=1` parameter assings a unique GPU to each rank.
151+
41152
[](){#ref-slurm-gh200}
42153
## NVIDIA GH200 GPU Nodes
43154

@@ -144,7 +255,8 @@ The configuration that is optimal for your application may be different.
144255
[NVIDIA's Multi-Process Service (MPS)]: https://docs.nvidia.com/deploy/mps/index.html
145256

146257
[](){#ref-slurm-amdcpu}
147-
## AMD CPU
258+
## AMD CPU Nodes
148259

260+
Alps has nodes with two AMD Epyc Rome CPU sockets per node for CPU-only workloads, most notably in the [Eiger][ref-cluster-eiger] cluster provided by the [HPC Platform][ref-platform-hpcp].
149261
!!! todo
150262
document how slurm is configured on AMD CPU nodes (e.g. eiger)

0 commit comments

Comments
 (0)