Skip to content

BTL UCT causing problems with UCX memory hooks #6666

@jsquyres

Description

@jsquyres

From https://www.mail-archive.com/[email protected]/msg20895.html and https://www.mail-archive.com/[email protected]/msg20904.html:

The issue below is caused by btl_uct – it mistakenly calls ucm_set_external_event() without checking opal_mem_hooks_support_level().
This leads UCX to believe that memory hooks would be provided by OMPI, but in fact they are not, so pinned physical pages become out-of-sync with process virtual address.

btl_uct wrong call: https://github.com/open-mpi/ompi/blob/master/opal/mca/btl/uct/btl_uct_component.c#L132
Correct way: https://github.com/open-mpi/ompi/blob/master/opal/mca/common/ucx/common_ucx.c#L104

FYI @yosefe

v4.0.x RMs: this feels like a blocker for v4.0.x. Thoughts?

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions