-
Notifications
You must be signed in to change notification settings - Fork 929
Closed
Description
PSM2 MTL has problems with initialization, in particular, in the call to psm2_ep_open when runninig on a single node.
The failure signature is
srun -N 1 -n 1 a.out
opal13.0Assertion failure at psm_ep.c:832: ep->epid != 0
miniFE.x:52223 terminated with signal 6 at PC=2aaaabd835f7 SP=7fffffffb7d8. Backtrace:
/lib64/libc.so.6(gsignal+0x37)[0x2aaaabd835f7]
/lib64/libc.so.6(abort+0x148)[0x2aaaabd84ce8]
/lib64/libpsm2.so.2(+0x11dea)[0x2aaab97c7dea]
/lib64/libpsm2.so.2(+0x10782)[0x2aaab97c6782]
/lib64/libpsm2.so.2(psm2_ep_open+0x3c5)[0x2aaab97c5165]
/opt/openmpi/1.10/intel/lib/openmpi/mca_mtl_psm2.so(ompi_mtl_psm2_module_init+0x196)[0x2aaab95b25e6]
/opt/openmpi/1.10/intel/lib/openmpi/mca_mtl_psm2.so(+0x2a00)[0x2aaab95b2a00]
/opt/openmpi/1.10/intel/lib/libmpi.so.12(ompi_mtl_base_select+0x9d)[0x2aaaaad6148d]
/opt/openmpi/1.10/intel/lib/openmpi/mca_pml_cm.so(+0x3d9b)[0x2aaab8916d9b]
/opt/openmpi/1.10/intel/lib/libmpi.so.12(mca_pml_base_select+0x433)[0x2aaaaad68643]
/opt/openmpi/1.10/intel/lib/libmpi.so.12(ompi_mpi_init+0x6ac)[0x2aaaaad1a06c]
/opt/openmpi/1.10/intel/lib/libmpi.so.12(MPI_Init+0xe4)[0x2aaaaad3b134]
miniFE.x[0x432da2]
miniFE.x[0x404e1a]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x2aaaabd6fb15]
miniFE.x[0x404c69]
This is with v1.10.1 but also shows up in v2.x.
The issue appeared on devel mail list:
https://www.open-mpi.org/community/lists/devel/2016/04/18762.php