Skip to content

OpenMPI 5.0.8 plays poorly with Python calling concurrent.futures.ProcessPoolExecutor calling subprocess("mpirun ...") compared to MPICH 4.3.1 #13390

@ilia-nikiforov-umn

Description

@ilia-nikiforov-umn

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

5.0.8

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

micromamba create -n ompi-env openmpi

Please describe the system on which you are running

  • Operating system/version:
    Windows 11 pro, but the example was run using a freshly installed Ubuntu 22.04 through WSL

  • Computer hardware:
    ThinkPad P1 Gen 6
    Processor: 13th Gen Intel(R) Core(TM) i7-13800H
    Installed RAM: 32 GB

  • Network type:


Details of the problem

Note that I while I have reduced this to a minimal example on a laptop running Ubuntu through WSL, I originally encountered the same exact problem running a serious simulation on an HPC (running LAMMPS on a Dell Precision 7920 Rack server with two Intel Xeon Platinum 8168 (“Skylake”), 2.7GHz, 24x cores, for a total of 48 cores). The speed of the simulation was affected in a way that was congruent with CPU utilization.

When calling mpirun as a Python subprocess using Python concurrent.futures.ProcessPoolExecutor, OpenMPI severely underutilizes CPUs, while MPICH does not.

To set up the minimal example, I:

  1. Created a brand new installation of Ubuntu 22.04 using WSL
  2. sudo apt update sudo apt install stress
  3. Installed micromamba and created two environments:
    micromamba create -n ompi-env openmpi
    micromamba create -n mpich-env mpich
  4. Wrote the following Python script:
from concurrent.futures import ProcessPoolExecutor
import subprocess

max_workers = 3
with ProcessPoolExecutor(max_workers=max_workers) as executor:
    for _ in range(max_workers):
        executor.submit(subprocess.run, "mpirun -np 2 stress --cpu 1 --timeout 60", shell=True)

If I run python3 stress.py & and look at top in ompi-env, I get a serious underutilization of the CPUs:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 3785 ilia      20   0    3712    320    320 R  71.7   0.0   0:07.73 stress
 3790 ilia      20   0    3712    320    320 R  71.0   0.0   0:07.94 stress
 3788 ilia      20   0    3712    320    320 R  69.3   0.0   0:07.71 stress
 3786 ilia      20   0    3712    160    160 R  64.3   0.0   0:07.37 stress
 3789 ilia      20   0    3712    320    320 R  64.0   0.0   0:07.94 stress
 3787 ilia      20   0    3712    320    320 R  60.0   0.0   0:07.39 stress

In mpich-env, the CPUs are fully utilized:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 3830 ilia      20   0    3712    320    320 R  99.7   0.0   0:04.99 stress
 3831 ilia      20   0    3712    160    160 R  99.7   0.0   0:04.99 stress
 3832 ilia      20   0    3712    320    320 R  99.7   0.0   0:04.99 stress
 3833 ilia      20   0    3712    160    160 R  99.7   0.0   0:04.99 stress
 3834 ilia      20   0    3712    320    320 R  99.7   0.0   0:04.99 stress
 3835 ilia      20   0    3712    320    320 R  99.7   0.0   0:04.99 stress

If I go back to ompi-env and set max_workers = 1 in my script, the issue disappears, whether I leave np=2 or go to np=6:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 3868 ilia      20   0    3712    320    320 R 100.0   0.0   0:06.46 stress
 3869 ilia      20   0    3712    320    320 R 100.0   0.0   0:06.46 stress
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
 3883 ilia      20   0    3712    320    320 R 100.0   0.0   0:04.49 stress
 3885 ilia      20   0    3712    320    320 R 100.0   0.0   0:04.49 stress
 3887 ilia      20   0    3712    320    320 R 100.0   0.0   0:04.49 stress
 3889 ilia      20   0    3712    320    320 R  99.7   0.0   0:04.48 stress
 3890 ilia      20   0    3712    320    320 R  99.7   0.0   0:04.48 stress
 3891 ilia      20   0    3712    320    320 R  99.7   0.0   0:04.48 stress

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions