Skip to content

Mpich rma/test4 fails assert in osc_pt2pt_passive_target.c #840

@bbenton

Description

@bbenton

mpich: test/mpi/rma/test4 causes an assertion failure in osc_pt2pt_passive_target.c:

test4: ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:862: ompi_osc_pt2pt_process_flush_ack: Assertion `((void *)0) != lock' failed.

Program received signal SIGABRT, Aborted.
0x00007ffff74b6cc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
56      ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  0x00007ffff74b6cc9 in __GI_raise (sig=sig@entry=6)
    at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007ffff74ba0d8 in __GI_abort () at abort.c:89
#2  0x00007ffff74afb86 in __assert_fail_base (
    fmt=0x7ffff76013d0 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
    assertion=assertion@entry=0x7fffef7d1eea "((void *)0) != lock", 
    file=file@entry=0x7fffef7d1a20 "../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c", line=line@entry=862, 
    function=function@entry=0x7fffef7d21e0 <__PRETTY_FUNCTION__.19860> "ompi_osc_pt2pt_process_flush_ack") at assert.c:92
#3  0x00007ffff74afc32 in __GI___assert_fail (
    assertion=0x7fffef7d1eea "((void *)0) != lock", 
    file=0x7fffef7d1a20 "../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c", line=862, 
    function=0x7fffef7d21e0 <__PRETTY_FUNCTION__.19860> "ompi_osc_pt2pt_process_flush_ack") at assert.c:101
#4  0x00007fffef7cd2a7 in ompi_osc_pt2pt_process_flush_ack (module=0x916b60, 
    source=1, flush_ack_header=0x9184a0)
    at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:862
#5  0x00007fffef7c81fb in ompi_osc_pt2pt_callback (request=0x755080)
    at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_data_move.c:1665
#6  0x00007ffff0863a15 in ompi_request_complete (request=0x755080, 
    with_signal=true) at ../../../../../ompi/request/request.h:404
#7  0x00007ffff086401d in recv_request_pml_complete (recvreq=0x755080)
    at ../../../../../ompi/mca/pml/ob1/pml_ob1_recvreq.h:199
#8  0x00007ffff0864dcd in mca_pml_ob1_recv_frag_callback_match (btl=0x71e320, 
    tag=65 'A', des=0x91ab58, cbdata=0x0)
    at ../../../../../ompi/mca/pml/ob1/pml_ob1_recvfrag.c:245
#9  0x00007ffff155daaa in btl_openib_handle_incoming (openib_btl=0x71e320, 
    ep=0x8dd6b0, frag=0x91ab58, byte_len=34)
    at ../../../../../opal/mca/btl/openib/btl_openib_component.c:3111
#10 0x00007ffff155f0ea in progress_one_device (device=0x715150)
    at ../../../../../opal/mca/btl/openib/btl_openib_component.c:3766
#11 0x00007ffff155f229 in btl_openib_component_progress ()
    at ../../../../../opal/mca/btl/openib/btl_openib_component.c:3804
#12 0x00007ffff6e80261 in opal_progress ()
    at ../../opal/runtime/opal_progress.c:189
#13 0x00007fffef7cb7c7 in opal_condition_wait (c=0x916ce8, m=0x916c80)
    at ../../../../../opal/threads/condition.h:76
#14 0x00007fffef7cc89f in ompi_osc_pt2pt_flush_lock (module=0x916b60, 
    lock=0x91a9d0, target=1)
    at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:550
#15 0x00007fffef7cc404 in ompi_osc_pt2pt_unlock_internal (target=1, 
    win=0x916a50)
    at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:412
#16 0x00007fffef7cc6ec in ompi_osc_pt2pt_unlock (target=1, win=0x916a50)
    at ../../../../../ompi/mca/osc/pt2pt/osc_pt2pt_passive_target.c:482
#17 0x00007ffff7b1424c in PMPI_Win_unlock (rank=1, win=0x916a50)
    at pwin_unlock.c:61
#18 0x000000000040278e in main (argc=1, argv=0x7fffffffd578) at test4.c:46

This also happens in 2.0 and 1.10. However, it did not happen in 1.8.8. Also, I see this with both openib and vader btl's.

This assertion also fails with mpich rma tests: linked_list_fop, linked_list_bench_lock_shr_nocheck, rma_contig, and acc_lock.

This fails on older Opteron processors, as well as AMD APUs.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions