Reference (GPAW) apps fighting for cores on node (SLURM)

For `psiflow` configurations where multiple apps run in parallel on the same worker node, it seems like `MPI` core affinities can bind multiple processes to the same core, leading to very poor performance.

<details>

<summary>Default behaviour</summary>

### Launching two GPAW evaluations on the same node

```
[CONFIG]
cores_per_worker: 8
launch_command: 'apptainer exec -e --no-init oras://ghcr.io/molmod/gpaw:24.1 /opt/entry.sh mpirun -np 8 --map-by CORE --bind-to CORE --display-map gpaw python DUMMY.py'
slurm:
  nodes_per_block: 1
  cores_per_node: 16

[OUTPUT JOB 1]
User:   ???@node3521.doduo.os
pid: 771665, CPU affinity: {75}
pid: 771662, CPU affinity: {2}
pid: 771667, CPU affinity: {76}
pid: 771673, CPU affinity: {79}
pid: 771675, CPU affinity: {80}
pid: 771671, CPU affinity: {78}
pid: 771660, CPU affinity: {0}
pid: 771669, CPU affinity: {77}

 ========================   JOB MAP   ========================

 Data for node: node3521	Num slots: 16	Max slots: 0	Num procs: 8
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0]]:[B/.][./././././././././././././.]
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 1 Bound: socket 0[core 1[hwt 0]]:[./B][./././././././././././././.]
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 2 Bound: socket 1[core 2[hwt 0]]:[./.][B/././././././././././././.]
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 3 Bound: socket 1[core 3[hwt 0]]:[./.][./B/./././././././././././.]
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 4 Bound: socket 1[core 4[hwt 0]]:[./.][././B/././././././././././.]
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 5 Bound: socket 1[core 5[hwt 0]]:[./.][./././B/./././././././././.]
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 6 Bound: socket 1[core 6[hwt 0]]:[./.][././././B/././././././././.]
 	Process OMPI jobid: [44168,1] App: 0 Process rank: 7 Bound: socket 1[core 7[hwt 0]]:[./.][./././././B/./././././././.]

 =============================================================

[OUTPUT JOB 2]
User:   ???@node3521.doduo.os
pid: 771664, CPU affinity: {75}
pid: 771663, CPU affinity: {2}
pid: 771666, CPU affinity: {76}
pid: 771661, CPU affinity: {0}
pid: 771670, CPU affinity: {78}
pid: 771674, CPU affinity: {80}
pid: 771672, CPU affinity: {79}
pid: 771668, CPU affinity: {77}

 ========================   JOB MAP   ========================

 Data for node: node3521	Num slots: 16	Max slots: 0	Num procs: 8
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0]]:[B/.][./././././././././././././.]
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 1 Bound: socket 0[core 1[hwt 0]]:[./B][./././././././././././././.]
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 2 Bound: socket 1[core 2[hwt 0]]:[./.][B/././././././././././././.]
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 3 Bound: socket 1[core 3[hwt 0]]:[./.][./B/./././././././././././.]
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 4 Bound: socket 1[core 4[hwt 0]]:[./.][././B/././././././././././.]
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 5 Bound: socket 1[core 5[hwt 0]]:[./.][./././B/./././././././././.]
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 6 Bound: socket 1[core 6[hwt 0]]:[./.][././././B/././././././././.]
 	Process OMPI jobid: [44169,1] App: 0 Process rank: 7 Bound: socket 1[core 7[hwt 0]]:[./.][./././././B/./././././././.]

 =============================================================
```
Calculations run extremely slowly on the same 8 cores, but do not crash

</details>

This is not unexpected, independent `mpirun` calls do not communicate which cores to use. It can be easily avoided by setting `cores_per_worker=cores_per_node` in the `psiflow` config, but that seems to counteract the `parsl` idea of blocks.

A hacky workaround I might have found is to wrap the `MPI` call inside a jobstep:
`srun -n 1 -c 8 -v --cpu-bind=v,cores apptainer exec [...] mpirun -np 8 --map-by CORE --bind-to CORE --display-map gpaw [...]`
where we have to ask for 1 task using 8 cores because otherwise `mpirun` complains about available slots. Very elegant.

<details>

<summary>Hacky behaviour</summary>

### Launching two GPAW evaluations on the same node

```
[CONFIG]
cores_per_worker: 8
launch_command: 'srun -n 1 -c 8 -v --cpu-bind=v,cores apptainer exec -e --no-init oras://ghcr.io/molmod/gpaw:24.1 /opt/entry.sh mpirun -np 8 --map-by CORE --bind-to CORE --display-map gpaw python DUMMY.py'
slurm:
  nodes_per_block: 1
  cores_per_node: 16

[OUTPUT JOB 1]
User:   ???@node4217.shinx.os
pid: 1935915, CPU affinity: {102}
pid: 1935897, CPU affinity: {94}
pid: 1935922, CPU affinity: {104}
pid: 1935857, CPU affinity: {93}
pid: 1935902, CPU affinity: {95}
pid: 1935907, CPU affinity: {99}
pid: 1935918, CPU affinity: {103}
pid: 1935911, CPU affinity: {100}

 ========================   JOB MAP   ========================

 Data for node: node4217	Num slots: 8	Max slots: 0	Num procs: 8
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 0 Bound: socket 0[core 0[hwt 0]]:[B/./.][././././.]
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 1 Bound: socket 0[core 1[hwt 0]]:[./B/.][././././.]
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 2 Bound: socket 0[core 2[hwt 0]]:[././B][././././.]
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 3 Bound: socket 1[core 3[hwt 0]]:[././.][B/./././.]
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 4 Bound: socket 1[core 4[hwt 0]]:[././.][./B/././.]
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 5 Bound: socket 1[core 5[hwt 0]]:[././.][././B/./.]
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 6 Bound: socket 1[core 6[hwt 0]]:[././.][./././B/.]
 	Process OMPI jobid: [64657,1] App: 0 Process rank: 7 Bound: socket 1[core 7[hwt 0]]:[././.][././././B]

 =============================================================

[OUTPUT JOB 2]
User:   ???@node4217.shinx.os
pid: 1935906, CPU affinity: {114}
pid: 1935912, CPU affinity: {116}
pid: 1935908, CPU affinity: {115}
pid: 1935901, CPU affinity: {112}
pid: 1935903, CPU affinity: {113}
pid: 1935896, CPU affinity: {111}
pid: 1935914, CPU affinity: {117}
pid: 1935864, CPU affinity: {105}

 ========================   JOB MAP   ========================

 Data for node: node4217	Num slots: 8	Max slots: 0	Num procs: 8
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 0 Bound: socket 1[core 0[hwt 0]]:[][B/././././././.]
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 1 Bound: socket 1[core 1[hwt 0]]:[][./B/./././././.]
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 2 Bound: socket 1[core 2[hwt 0]]:[][././B/././././.]
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 3 Bound: socket 1[core 3[hwt 0]]:[][./././B/./././.]
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 4 Bound: socket 1[core 4[hwt 0]]:[][././././B/././.]
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 5 Bound: socket 1[core 5[hwt 0]]:[][./././././B/./.]
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 6 Bound: socket 1[core 6[hwt 0]]:[][././././././B/.]
 	Process OMPI jobid: [64663,1] App: 0 Process rank: 7 Bound: socket 1[core 7[hwt 0]]:[][./././././././B]

 =============================================================
```
Here, the `JOB MAP` info seems to contradict the `CPU affinity` logs, but I predict this is due to `srun` interfering in what `MPI` has access to (notice `Num slots: 8` instead of the total 16 that are available on the node). Also, both calculations seem to run fine.

</details>

Generally, I think this issue would present itself for most bash apps running on the same node (all reference calculations, but also I-PI simulations). Why does `psiflow`  not use `srun` (or equivalent) to separate app resources (e.g., through some `parsl` launchery deal)?

On a sidenote, it is apparently possible to launch `GPAW` directly with `srun` ([see](https://gitlab.com/gpaw/gpaw/-/merge_requests/1173#note_1639196437)), which does not seem to work with the container sandwiched in between. That's an issue for the `GPAW` repo, however.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reference (GPAW) apps fighting for cores on node (SLURM) #76

Launching two GPAW evaluations on the same node

Launching two GPAW evaluations on the same node

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Reference (GPAW) apps fighting for cores on node (SLURM) #76

Description

Launching two GPAW evaluations on the same node

Launching two GPAW evaluations on the same node

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions