document affinity and openmp fixes for GB runs (#60)

bcumming · web-flow · commit f97fa337c017 · 2025-03-26T10:10:50.000+01:00
diff --git a/docs/guides/gb2025.md b/docs/guides/gb2025.md
@@ -0,0 +1,87 @@
+# Gordon Bell and HPL runs 2025
+
+For Gordon Bell and HPL runs in March-April 2025, CSCS has created a reservation on Santis with 1333 nodes (12 cabinets).
+
+For the runs, CSCS has applied some updates and changes that aim to improve performance and scaling scale, particularly for NCCL.
+If you are already familiar with running on Daint, you might have to make some small changes to your current job scripts and parameters, which will be documented here.
+
+## Santis
+
+### Connecting
+
+Connecting to Santis via SSH is the same as for Daint and Clariden, see the [ssh guide][ref-ssh] for more information.
+
+Add the following to your [SSH configuration][ref-ssh-config] to enable you to directly connect to Santis using `ssh santis`.
+```
+Host santis
+    HostName santis.alps.cscs.ch
+    ProxyJump ela
+# change cscsusername to your CSCS username
+    User cscsusername
+    IdentityFile ~/.ssh/cscs-key
+    IdentitiesOnly yes
+```
+
+### Reservations
+
+The `normal` partition is used with no reservation, which means that that jobs can be submittied without `--partition` and `--reservation` flags.
+
+### Storage
+
+Your data sets from Daint are available on Santis
+
+* the same Home is shared between Daint, Clariden and Santis
+* the same Scratch is mounted on both Santis and Daint
+* Store/Project are also mounted.
+
+## Low Noise Mode
+
+Low noise mode (LNM) is now enabled.
+This confines system processes and operations to the first core of each of the four NUMA regions in a node (i.e., cores 0, 72, 144, 216).
+
+The consequence of this setting is that only 71 cores per socket can be requested by an application (for a total of 284 cores instead of 288 cores per node).
+
+!!! warning "Unable to allocate resources: Requested node configuration is not availabl"
+    If you try to use all 72 cores on each socket, SLURM will give a hard error, because only 71 are available:
+
+    ```
+    # try to run 4 ranks per node, with 72 cores each
+    > srun -n4 -N1 -c72 --reservation=reshuffling ./build/affinity.mpi
+    srun: error: Unable to allocate resources: Requested node configuration is not available
+    ```
+
+One consequence of this change is that thread affinity and OpenMP settings that worked on Daint might cause large slowdown in the new configuration.
+
+### SLURM
+
+Explicitly set the number of cores per task using the `--cores-per-task/-c` flag, e.g.:
+```
+#SBATCH --cores-per-task=64
+#SBATCH --cores-per-task=71
+```
+or
+```
+srun -N1 -n4 -c71 ...
+```
+
+**Do not** use the `--cpu-bind` flag to control affinity
+
+* this can cause large slowdown, particularly with `--cpu-bind=socket`. We are investigating how to fix this.
+
+If you see significant slowdown and you want to report it, please provide the output of using the `--cpu-bind=verbose` flag.
+
+### OpenMP
+
+If your application uses OpenMP, try setting the following in your job script:
+
+```bash
+export OPENMP_PLACES=cores
+export OPENMP_PROC_BIND=close
+```
+
+Without these settings, we have observed application slowdown due to poor thread placement.
+
+## NCCL
+
+!!! todo
+    write a guide on which versions to use, environment variables to set, etc.
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -97,6 +97,7 @@ nav:
     - guides/index.md
     - 'Internet Access on Alps': guides/internet-access.md
     - 'Storage': guides/storage.md
+    - 'Gordon Bell 2025': guides/gb2025.md
   - 'Policies':
     - policies/index.md
     - 'User Regulations': policies/regulations.md