-
Notifications
You must be signed in to change notification settings - Fork 929
Closed
Description
We have an application where we launch a single process per node, but want to use all the available threads (processing elements) available on a dual-socket machine in that single process. Our application uses OpenMP within each process for shared memory parallelism, and we want to run OpenMP across the two sockets.
In OpenMPI 4.0.4, we had been using the command and it runs fine:
/openmpi-4.1.4/bin/mpirun -np 1 --report-bindings --map-by ppr:1:node:PE=112 ls
[b200-003:1981388] MCW rank 0 is not bound (or bound to all available processors)
However, in OpenMPI 5.0.5 (and 5.0.7), we get this behavior:
/openmpi-5.0.5/bin/mpirun -np 1 --report-bindings --map-by ppr:1:node:PE=112 ls
--------------------------------------------------------------------------
Your job failed to map because the resulting process placement
would cause the process to be bound to CPUs in more than one
package:
Mapping policy: BYNODE:NOOVERSUBSCRIBE
Binding policy: CORE:IF-SUPPORTED
CPUs/rank: 112
This configuration almost always results in a loss of performance
that can significantly impact applications. Either alter the
mapping, binding, and/or cpus/rank policies so that each process
can fit into a single package, or consider using an alternative
mapper that can handle this configuration (e.g., the rankfile mapper).
--------------------------------------------------------------------------
Has the behavior changed here between MPI 5.x and 4.x? If so, how should I map my threads in OpenMPI 5.0.5 to assign all the threads in the node to a single rank?