-
Notifications
You must be signed in to change notification settings - Fork 929
Closed
Labels
Description
From https://www.mail-archive.com/[email protected]/msg20895.html and https://www.mail-archive.com/[email protected]/msg20904.html:
The issue below is caused by btl_uct – it mistakenly calls ucm_set_external_event() without checking opal_mem_hooks_support_level().
This leads UCX to believe that memory hooks would be provided by OMPI, but in fact they are not, so pinned physical pages become out-of-sync with process virtual address.btl_uct wrong call: https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/uct/btl_uct_component.c#L132
Correct way: https://github.com/open-mpi/ompi/blob/master/opal/mca/common/ucx/common_ucx.c#L104
FYI @yosefe
v4.0.x RMs: this feels like a blocker for v4.0.x. Thoughts?