-
Couldn't load subscription status.
- Fork 928
Description
Background information
What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)
5.0.8
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
micromamba create -n ompi-env openmpi
Please describe the system on which you are running
-
Operating system/version:
Windows 11 pro, but the example was run using a freshly installed Ubuntu 22.04 through WSL -
Computer hardware:
ThinkPad P1 Gen 6
Processor: 13th Gen Intel(R) Core(TM) i7-13800H
Installed RAM: 32 GB -
Network type:
Details of the problem
Note that I while I have reduced this to a minimal example on a laptop running Ubuntu through WSL, I originally encountered the same exact problem running a serious simulation on an HPC (running LAMMPS on a Dell Precision 7920 Rack server with two Intel Xeon Platinum 8168 (“Skylake”), 2.7GHz, 24x cores, for a total of 48 cores). The speed of the simulation was affected in a way that was congruent with CPU utilization.
When calling mpirun as a Python subprocess using Python concurrent.futures.ProcessPoolExecutor, OpenMPI severely underutilizes CPUs, while MPICH does not.
To set up the minimal example, I:
- Created a brand new installation of Ubuntu 22.04 using WSL
sudo apt updatesudo apt install stress- Installed micromamba and created two environments:
micromamba create -n ompi-env openmpi
micromamba create -n mpich-env mpich - Wrote the following Python script:
from concurrent.futures import ProcessPoolExecutor
import subprocess
max_workers = 3
with ProcessPoolExecutor(max_workers=max_workers) as executor:
for _ in range(max_workers):
executor.submit(subprocess.run, "mpirun -np 2 stress --cpu 1 --timeout 60", shell=True)
If I run python3 stress.py & and look at top in ompi-env, I get a serious underutilization of the CPUs:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3785 ilia 20 0 3712 320 320 R 71.7 0.0 0:07.73 stress
3790 ilia 20 0 3712 320 320 R 71.0 0.0 0:07.94 stress
3788 ilia 20 0 3712 320 320 R 69.3 0.0 0:07.71 stress
3786 ilia 20 0 3712 160 160 R 64.3 0.0 0:07.37 stress
3789 ilia 20 0 3712 320 320 R 64.0 0.0 0:07.94 stress
3787 ilia 20 0 3712 320 320 R 60.0 0.0 0:07.39 stress
In mpich-env, the CPUs are fully utilized:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3830 ilia 20 0 3712 320 320 R 99.7 0.0 0:04.99 stress
3831 ilia 20 0 3712 160 160 R 99.7 0.0 0:04.99 stress
3832 ilia 20 0 3712 320 320 R 99.7 0.0 0:04.99 stress
3833 ilia 20 0 3712 160 160 R 99.7 0.0 0:04.99 stress
3834 ilia 20 0 3712 320 320 R 99.7 0.0 0:04.99 stress
3835 ilia 20 0 3712 320 320 R 99.7 0.0 0:04.99 stress
If I go back to ompi-env and set max_workers = 1 in my script, the issue disappears, whether I leave np=2 or go to np=6:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3868 ilia 20 0 3712 320 320 R 100.0 0.0 0:06.46 stress
3869 ilia 20 0 3712 320 320 R 100.0 0.0 0:06.46 stress
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3883 ilia 20 0 3712 320 320 R 100.0 0.0 0:04.49 stress
3885 ilia 20 0 3712 320 320 R 100.0 0.0 0:04.49 stress
3887 ilia 20 0 3712 320 320 R 100.0 0.0 0:04.49 stress
3889 ilia 20 0 3712 320 320 R 99.7 0.0 0:04.48 stress
3890 ilia 20 0 3712 320 320 R 99.7 0.0 0:04.48 stress
3891 ilia 20 0 3712 320 320 R 99.7 0.0 0:04.48 stress