Skip to content

Conversation

@hppritcha
Copy link
Member

This commit adds two MCA parameters:

mtl_ofi_disable_hmem
btl_ofi_disable_hmem

to allow for disabling use of FI_HMEM in cases where the provider may advertise support for HMEM but in fact may not, and does not observe the OFI libfabric FI_HMEM_DISABLE_P2P environment variable.

This is actually the situation as of the writing of this commit on certain systems owing to limitations in kernel support for registration of accelerator memory. The OFI provider on such systems unfortunately stil advertises support for FI_HMEM with ZE but fails when trying to register memory. These mca parameters allow for turning off use of FI_HMEM in such cases.

Related to ofiwg/libfabric#9315

Signed-off-by: Howard Pritchard [email protected]
(cherry picked from commit baf882a)

This commit adds two MCA parameters:

mtl_ofi_disable_hmem
btl_ofi_disable_hmem

to allow for disabling use of FI_HMEM in cases where the provider may advertise support for HMEM but in fact may not, and does not
observe the OFI libfabric FI_HMEM_DISABLE_P2P environment variable.

This is actually the situation as of the writing of this commit on certain systems owing to limitations in kernel support for registration of accelerator memory.  The OFI provider on such systems unfortunately stil advertises support for FI_HMEM with ZE but fails when trying to register memory.  These mca parameters allow for turning off use of FI_HMEM in such cases.

Related to ofiwg/libfabric#9315

Signed-off-by: Howard Pritchard <[email protected]>
(cherry picked from commit baf882a)
@github-actions github-actions bot added this to the v5.0.0 milestone Oct 11, 2023
@janjust janjust merged commit 40e3fdb into open-mpi:v5.0.x Oct 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants