-
Couldn't load subscription status.
- Fork 928
Closed
Description
Thank you for taking the time to submit an issue!
Background information
What version of Open MPI are you using? (e.g., v1.10.3, v2.1.0, git branch name and hash, etc.)
v3.1.4
Describe how Open MPI was installed (e.g., from a source/distribution tarball, from a git clone, from an operating system distribution package, etc.)
built from a git clone of v3.1.4.
Please describe the system on which you are running
- Operating system/version: RHEL 7.5/7.6
- Computer hardware: Xeon E6-2699
- Network type: OPA 100
Details of the problem
Our internal acceptance tests for changing the version of openmpi we ship includes the test programs shipped with the mpich package. (https://www.mpich.org/2019/06/07/mpich-3-3-1-released/).
Nearly all of the ~400 test programs work fine, but 15 crash, all in the same way, all PT2PT tests using MPI_Bsend.
Using a debug build of ompi 3.1.4, I get the following stack trace:
/usr/mpi/gcc/openmpi-3.1.4rc2-hfi/bin/mpirun -H hdsmpriv01,hdsmpriv02 --allow-run-as-root --mca mtl ofi --mca mtl_ofi_provider_include sockets --mca osc_rdma_mtls ofi -map-by ppr:1:node test/mpi/pt2pt/bsend1
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `test/mpi/pt2pt/bsend1'.
Program terminated with signal 11, Segmentation fault.
#0 0x00007f3467be24ca in opal_thread_add_32 (delta=-1, addr=0x1)
at ../../../../opal/threads/thread_usage.h:144
144 OPAL_THREAD_DEFINE_ATOMIC_ADD(int32_t, 32)
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 infinipath-psm-3.3-26_g604758e_open.2.el7.x86_64 libatomic-4.8.5-28.el7.x86_64 libgcc-4.8.5-28.el7.x86_64 libibverbs-15-6.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 librdmacm-15-6.el7.x86_64 libuuid-2.23.2-52.el7.x86_64 numactl-libs-2.0.9-7.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) where
#0 0x00007f3467be24ca in opal_thread_add_32 (delta=-1, addr=0x1)
at ../../../../opal/threads/thread_usage.h:144
#1 wait_sync_update (updates=1, status=0, sync=0x1) at ../../../../opal/threads/wait_sync.h:112
#2 ompi_request_complete (with_signal=true, request=0x24a2580)
at ../../../../ompi/request/request.h:447
#3 mca_pml_cm_send (buf=<optimized out>, count=7, datatype=<optimized out>, dst=<optimized out>,
tag=<optimized out>, sendmode=<optimized out>, comm=0x607520 <ompi_mpi_comm_world>) at pml_cm.h:361
#4 0x00007f3478d68f62 in PMPI_Bsend (buf=0x7fff40f01230, count=7, type=0x606220 <ompi_mpi_char>,
dest=<optimized out>, tag=<optimized out>, comm=0x607520 <ompi_mpi_comm_world>) at pbsend.c:80
#5 0x0000000000401d15 in main (argc=1, argv=0x7fff40f01368) at bsend1.c:52
Metadata
Metadata
Assignees
Labels
No labels