-
Notifications
You must be signed in to change notification settings - Fork 929
Closed
Description
I haven't figured out how I get the systems in this state, but somehow I'm seeing the cm pml is being chosen over ob1. This results in the following error:
[root@stevo3 ~]# /opt/ompi-2.0.0rc2/bin/mpirun --map-by node -np 2 --host stevo3,stevo4 --allow-run-as-root --mca mpi_add_procs_cutoff 1024 --mca btl openib,sm,self /usr/local/src/osu-micro-benchmarks-5.0/mpi/pt2pt/osu_bw
[stevo4.asicdesigners.com:19152] [[32156,1],1] selected pml cm, but peer [[32156,1],0] on stevo3 selected pml ob1
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)
--------------------------------------------------------------------------
MPI_INIT has failed because at least one MPI process is unreachable
from another. This *usually* means that an underlying communication
plugin -- such as a BTL or an MTL -- has either not loaded or not
allowed itself to be used. Your MPI job will now abort.
You may wish to try to narrow down the problem;
* Check the output of ompi_info to see which BTL/MTL plugins are
available.
* Run your application with MPI_THREAD_SINGLE.
* Set the MCA parameter btl_base_verbose to 100 (or mtl_base_verbose,
if using MTL-based communications) to see exactly which
communication plugins were considered and/or discarded.
--------------------------------------------------------------------------
Bumping up the ob1 priority works around the problem:
[root@stevo3 ~]# /opt/ompi-2.0.0rc2/bin/mpirun --map-by node -np 2 --host stevo3,stevo4 --allow-run-as-root --mca mpi_add_procs_cutoff 1024 --mca pml_ob1_priority 90 --mca btl openib,sm,self /usr/local/src/osu-micro-benchmarks-5.0/mpi/pt2pt/osu_bw
# OSU MPI Bandwidth Test v5.0
# Size Bandwidth (MB/s)
1 1.92
2 3.96
4 1.92
8 12.41
16 7.27
32 49.66
64 28.93
128 192.53
256 358.52
512 779.84
1024 389.50
2048 579.23
4096 492.21
8192 550.24
16384 1077.49
32768 1093.05
65536 1105.54
131072 1200.82
262144 1277.78
524288 1342.21
1048576 1559.07
2097152 1837.14
4194304 2254.85
[root@stevo3 ~]#
Metadata
Metadata
Assignees
Labels
No labels