-
Notifications
You must be signed in to change notification settings - Fork 935
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
4.1.2
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
Installed with apt-get
If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.
Please describe the system on which you are running
- Operating system/version: Ubuntu 22.04.5 LTS
- Computer hardware: AMD Ryzen Threadripper PRO 5995WX 64-Cores (SuperMicro motherboard)
- Network type: Ethernet, but these runs were just using internal SMP parallelism
Details of the problem
I am prototyping an application combining MPI with C++ native multithreading. Currently I'm doing this on my big workstation, although eventually the application will move to an HPC platform.
When I use mpirun with 1 or 2 MPI ranks, the app behaves as if I specified "-bind-to core" -- that is, each rank runs as if only two of its threads can be running at any given time. When I specify 3 or more MPI ranks, then every rank runs as if I specified "-bind-to none", that is, each rank can spawn as many threads as it wants and they run simultaneously up to the hardware limits of the chip (128 total threads).
I can obtain better performance and the expected behavior for 1 or 2 ranks if I manually specify "-bind-to none", but these seem like strange defaults. Is there some legitimate reason for the behavior I have described?