-
Couldn't load subscription status.
- Fork 929
Closed
Description
mpich: test/mpi/rma/test4 causes an assertion failure in osc_pt2pt_passive_target.c:
test4: ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:862: ompi_osc_pt2pt_process_flush_ack: Assertion `((void *)0) != lock' failed.
Program received signal SIGABRT, Aborted.
0x00007ffff74b6cc9 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 0x00007ffff74b6cc9 in __GI_raise (sig=sig@entry=6)
at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1 0x00007ffff74ba0d8 in __GI_abort () at abort.c:89
#2 0x00007ffff74afb86 in __assert_fail_base (
fmt=0x7ffff76013d0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n",
assertion=assertion@entry=0x7fffef7d1eea "((void *)0) != lock",
file=file@entry=0x7fffef7d1a20 "../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c", line=line@entry=862,
function=function@entry=0x7fffef7d21e0 <__PRETTY_FUNCTION__.19860> "ompi_osc_pt2pt_process_flush_ack") at assert.c:92
#3 0x00007ffff74afc32 in __GI___assert_fail (
assertion=0x7fffef7d1eea "((void *)0) != lock",
file=0x7fffef7d1a20 "../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c", line=862,
function=0x7fffef7d21e0 <__PRETTY_FUNCTION__.19860> "ompi_osc_pt2pt_process_flush_ack") at assert.c:101
#4 0x00007fffef7cd2a7 in ompi_osc_pt2pt_process_flush_ack (module=0x916b60,
source=1, flush_ack_header=0x9184a0)
at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:862
#5 0x00007fffef7c81fb in ompi_osc_pt2pt_callback (request=0x755080)
at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_data_move.c:1665
#6 0x00007ffff0863a15 in ompi_request_complete (request=0x755080,
with_signal=true) at ../../../../../ompi/request/request.h:404
#7 0x00007ffff086401d in recv_request_pml_complete (recvreq=0x755080)
at ../../../../../ompi/mca/pml/ob1/pml_ob1_recvreq.h:199
#8 0x00007ffff0864dcd in mca_pml_ob1_recv_frag_callback_match (btl=0x71e320,
tag=65 'A', des=0x91ab58, cbdata=0x0)
at ../../../../../ompi/mca/pml/ob1/pml_ob1_recvfrag.c:245
#9 0x00007ffff155daaa in btl_openib_handle_incoming (openib_btl=0x71e320,
ep=0x8dd6b0, frag=0x91ab58, byte_len=34)
at ../../../../../opal/mca/btl/openib/btl_openib_component.c:3111
#10 0x00007ffff155f0ea in progress_one_device (device=0x715150)
at ../../../../../opal/mca/btl/openib/btl_openib_component.c:3766
#11 0x00007ffff155f229 in btl_openib_component_progress ()
at ../../../../../opal/mca/btl/openib/btl_openib_component.c:3804
#12 0x00007ffff6e80261 in opal_progress ()
at ../../opal/runtime/opal_progress.c:189
#13 0x00007fffef7cb7c7 in opal_condition_wait (c=0x916ce8, m=0x916c80)
at ../../../../../opal/threads/condition.h:76
#14 0x00007fffef7cc89f in ompi_osc_pt2pt_flush_lock (module=0x916b60,
lock=0x91a9d0, target=1)
at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:550
#15 0x00007fffef7cc404 in ompi_osc_pt2pt_unlock_internal (target=1,
win=0x916a50)
at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:412
#16 0x00007fffef7cc6ec in ompi_osc_pt2pt_unlock (target=1, win=0x916a50)
at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:482
#17 0x00007ffff7b1424c in PMPI_Win_unlock (rank=1, win=0x916a50)
at pwin_unlock.c:61
#18 0x000000000040278e in main (argc=1, argv=0x7fffffffd578) at test4.c:46
This also happens in 2.0 and 1.10. However, it did not happen in 1.8.8. Also, I see this with both openib and vader btl's.
This assertion also fails with mpich rma tests: linked_list_fop, linked_list_bench_lock_shr_nocheck, rma_contig, and acc_lock.
This fails on older Opteron processors, as well as AMD APUs.