Skip to content

openmpi-2.0.0rc2: mtl/ofi/verbs being chosen over pml/ob1/btl/openib #1676

@larrystevenwise

Description

@larrystevenwise

I haven't figured out how I get the systems in this state, but somehow I'm seeing the cm pml is being chosen over ob1. This results in the following error:

[root@stevo3 ~]# /opt/ompi-2.0.0rc2/bin/mpirun --map-by node -np 2 --host stevo3,stevo4 --allow-run-as-root --mca mpi_add_procs_cutoff 1024 --mca btl openib,sm,self /usr/local/src/osu-micro-benchmarks-5.0/mpi/pt2pt/osu_bw
[stevo4.asicdesigners.com:19152] [[32156,1],1] selected pml cm, but peer [[32156,1],0] on stevo3 selected pml ob1
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***    and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another.  This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used.  Your MPI job will now abort.

You may wish to try to narrow down the problem;

 * Check the output of ompi_info to see which BTL/MTL plugins are
   available.
 * Run your application with MPI_THREAD_SINGLE.
 * Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
   if using MTL-based communications) to see exactly which
   communication plugins were considered and/or discarded.
--------------------------------------------------------------------------

Bumping up the ob1 priority works around the problem:

[root@stevo3 ~]# /opt/ompi-2.0.0rc2/bin/mpirun --map-by node -np 2 --host stevo3,stevo4 --allow-run-as-root --mca mpi_add_procs_cutoff 1024 --mca pml_ob1_priority 90 --mca btl openib,sm,self /usr/local/src/osu-micro-benchmarks-5.0/mpi/pt2pt/osu_bw
# OSU MPI Bandwidth Test v5.0
# Size      Bandwidth (MB/s)
1                       1.92
2                       3.96
4                       1.92
8                      12.41
16                      7.27
32                     49.66
64                     28.93
128                   192.53
256                   358.52
512                   779.84
1024                  389.50
2048                  579.23
4096                  492.21
8192                  550.24
16384                1077.49
32768                1093.05
65536                1105.54
131072               1200.82
262144               1277.78
524288               1342.21
1048576              1559.07
2097152              1837.14
4194304              2254.85
[root@stevo3 ~]#

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions