- 
                Notifications
    You must be signed in to change notification settings 
- Fork 929
Closed
Labels
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
openmpi 3.1.3 has this issue. openmpi 3.1.1 doesn't have.
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
openmpi is built from source code with: ./configure --prefix=/opt/rdma/mpi/openmpi --enable-mpirun-prefix-by-default --with-cuda --disable-io-romio --enable-picky
Please describe the system on which you are running
- Operating system/version: CentOS 7.4, x86_64.
- Computer hardware: PC
- Network type: same node shared memory
Details of the problem
Run intel mpi alltoall on same node using shared memory. OpenMPI 3.1.3 hang. OpenMPI 3.1.1 doesn't have this issue. For OpenMPI 3.1.3, when replacing vader to 'smcuda', the hang goes away and MPI works as normal.
shell$ mpirun -n 2 --mca btl vader,self IMB-MPI1 alltoall 
(gdb) bt
#0  0x00007f4c86bbd16e in mca_btl_vader_component_progress () from /opt/rdma/mpi/openmpi/lib/openmpi/mca_btl_vader.so
#1  0x00007f4c96f3a4ec in opal_progress () from /opt/rdma/mpi/openmpi/lib/libopen-pal.so.40
#2  0x00007f4c97afb885 in ompi_request_default_wait () from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#3  0x00007f4c97b4e5aa in ompi_coll_base_barrier_intra_two_procs () from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#4  0x00007f4c97b109d7 in PMPI_Barrier () from /opt/rdma/mpi/openmpi/lib/libmpi.so.40
#5  0x000000000040b4e3 in IMB_alltoall ()
#6  0x0000000000405bad in IMB_init_buffers_iter ()
#7  0x0000000000402105 in main ()