You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/running/slurm.md
+8-7Lines changed: 8 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -437,12 +437,9 @@ The 256 GB of a standard-memory node are divided into 8 NUMA nodes of 32 GB, wit
437
437
438
438
Note that this command was run on a large-memory node that has 8 x 64 GB NUMA regions, for a total of 512 GB.
439
439
440
-
The examples above placed one rank per socket, which is not optimal for NUMA access.
441
-
To constrain
440
+
The examples above placed one rank per socket, which is not optimal for NUMA access, because cores assigned to each rank are spread over the 4 NUMA nodes on the socket.
441
+
To constrain tasks to NUMA nodes, use 16 cores per task:
442
442
443
-
!!! Note "Always test"
444
-
It might still be optimal for applications that have high threading efficiency and benefit from using fewer MPI ranks to have one rank per socket or even one one rank per node.
In the above examples all threads on each -- we are effectively allowing the OS to schedule the threads on the available set of cores as it sees fit.
461
-
This often gives the best performance, however sometimes it is beneficial to bind threads to explicit cores.
457
+
!!! Note "Always test"
458
+
It might still be optimal for applications that have high threading efficiency and benefit from using fewer MPI ranks to have one rank per socket or even one one rank per node.
459
+
Always test!
462
460
463
461
### OpenMP
464
462
463
+
In the above examples all threads on each -- we are effectively allowing the OS to schedule the threads on the available set of cores as it sees fit.
464
+
This often gives the best performance, however sometimes it is beneficial to bind threads to explicit cores.
465
+
465
466
The OpenMP threading runtime provides additional options for controlling the pinning of threads to the cores assinged to each MPI rank.
466
467
467
468
Use the `--omp` flag with `affinity.mpi` to get more detailed information about OpenMP thread affinity.
0 commit comments