Skip to content

Can't run mpi application with ucx #13495

@TroyMitchell911

Description

@TroyMitchell911

Background information

What version of Open MPI are you using? (e.g., v4.1.6, v5.0.1, git branch name and hash, etc.)

I'm using v5.0.8

Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)

from github repo

If you are building/installing from a git clone, please copy-n-paste the output from git submodule status.

❯ git submodule status
 907b1ccaeec61a1197f0ee5264d4fef20b257b84 3rd-party/openpmix (v5.0.8)
 222f03fbb98b71abd293aa205b38fa9a38e57965 3rd-party/prrte (v3.0.10-3-g222f03fbb9)
 dfff67569fb72dbf8d73a1dcf74d091dad93f71b config/oac (dfff675)

Please describe the system on which you are running

  • Operating system/version: Arch
  • Computer hardware: 12th Gen Intel(R) Core(TM) i5-12500H
  • Network type: WLAN

Details of the problem

I built ucx with this command:

❯ ./autogen.sh &&  ./configure --prefix=/opt/ucx --enable-mt && make -j12 && make install                   

built openmpi with this command:

./autogen.pl && ./configure --prefix=/opt/ompi --with-ucx=/opt/ucx && make -j14 && make install

And then I got mpi execute file:

❯ ompi_info | grep ucx
  Configure command line: '--prefix=/opt/ompi' '--with-ucx=/opt/ucx'
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v5.0.8)
                 MCA pml: ucx (MCA v2.1.0, API v2.1.0, Component v5.0.8)

But when I run mpi application with ucx, I got:

❯ mpirun -np 2 -mca pml ucx ./helloworld
--------------------------------------------------------------------------
No components were able to be opened in the pml framework.

This typically means that either no components of this type were
installed, or none of the installed components can be loaded.
Sometimes this means that shared libraries required by these
components are unable to be found/loaded.

  Host:      Troy-arch
  Framework: pml
--------------------------------------------------------------------------
--------------------------------------------------------------------------
prterun has exited due to process rank 0 with PID 0 on node Troy-arch calling
"abort". This may have caused other processes in the application to be
terminated by signals sent by prterun (as reported here).
--------------------------------------------------------------------------

without ucx, it works:

❯ mpirun -np 2 -mca btl self,sm ./helloworld
Hello from rank 0 of 2 on Troy-arch
Hello from rank 1 of 2 on Troy-arch

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions