Skip to content

prterun hangs, Open MPI v5.0.6 #12939

@axiom-ctrl

Description

@axiom-ctrl

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

  • Open MPI v5.0.6

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

  • from a source/distribution tarball

Please describe the system on which you are running

  • Operating system/version: Red Hat Enterprise Linux 8.8
  • Computer hardware: HPE Cray EX
  • Network type: HPE Slingshot

Details of the problem

Please describe, in detail, the problem that you are having, including the behavior you expect to see, the actual behavior that you are seeing, steps to reproduce the problem, etc. It is most helpful if you can attach a small program that a developer can use to reproduce your problem.

I have noticed that sometimes, without any specific regularity, after initiating an mpirun command, a prterun hang occurs, meaning the main executable application does not start. The likelihood of this happening increases when launching a larger number of MPI processes across different computational nodes.

Open MPI was compiled with PBS support, as outlined below:

./configure \
    CC="gcc" \
    CXX="g++" \
    FC="gfortran" \
    --prefix=${install_dir} \
    --enable-shared \
    --enable-static \
    --with-pbs \
    --with-libfabric="/opt/cray" \
    --with-libfabric-libdir="/opt/cray/lib64" \
    --with-tm="/opt/pbs" \
    --with-tm-libdir="/opt/pbs/lib"

I run the application as follows:

mpirun -np 32 pw.x -i scf.input

I have observed the same behavior with earlier versions of Open MPI 5 as well. This issue is not exclusively related to the application I used (i.e. Quantum ESPRESSO) and occurs independently of any specific computational node. In other words, simply restarting the job often resolves the problem.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions