Skip to content

Commit 38929e7

Browse files
committed
pr tweaks
1 parent d0be829 commit 38929e7

File tree

1 file changed

+8
-7
lines changed

1 file changed

+8
-7
lines changed

docs/running/slurm.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -437,12 +437,9 @@ The 256 GB of a standard-memory node are divided into 8 NUMA nodes of 32 GB, wit
437437

438438
Note that this command was run on a large-memory node that has 8 x 64 GB NUMA regions, for a total of 512 GB.
439439

440-
The examples above placed one rank per socket, which is not optimal for NUMA access.
441-
To constrain
440+
The examples above placed one rank per socket, which is not optimal for NUMA access, because cores assigned to each rank are spread over the 4 NUMA nodes on the socket.
441+
To constrain tasks to NUMA nodes, use 16 cores per task:
442442

443-
!!! Note "Always test"
444-
It might still be optimal for applications that have high threading efficiency and benefit from using fewer MPI ranks to have one rank per socket or even one one rank per node.
445-
Always test!
446443

447444
```console title="One MPI rank per NUMA region"
448445
$ srun -n8 -N1 -c16 --hint=nomultithread ./affinity.mpi
@@ -457,11 +454,15 @@ rank 6 @ nid002199: thread 0 -> cores [ 48: 63]
457454
rank 7 @ nid002199: thread 0 -> cores [112:127]
458455
```
459456

460-
In the above examples all threads on each -- we are effectively allowing the OS to schedule the threads on the available set of cores as it sees fit.
461-
This often gives the best performance, however sometimes it is beneficial to bind threads to explicit cores.
457+
!!! Note "Always test"
458+
It might still be optimal for applications that have high threading efficiency and benefit from using fewer MPI ranks to have one rank per socket or even one one rank per node.
459+
Always test!
462460

463461
### OpenMP
464462

463+
In the above examples all threads on each -- we are effectively allowing the OS to schedule the threads on the available set of cores as it sees fit.
464+
This often gives the best performance, however sometimes it is beneficial to bind threads to explicit cores.
465+
465466
The OpenMP threading runtime provides additional options for controlling the pinning of threads to the cores assinged to each MPI rank.
466467

467468
Use the `--omp` flag with `affinity.mpi` to get more detailed information about OpenMP thread affinity.

0 commit comments

Comments
 (0)