ucp: allow eager inline sends for host memory when CUDA MDs are present#11306
Open
ndg8743 wants to merge 1 commit intoopenucx:masterfrom
Open
ucp: allow eager inline sends for host memory when CUDA MDs are present#11306ndg8743 wants to merge 1 commit intoopenucx:masterfrom
ndg8743 wants to merge 1 commit intoopenucx:masterfrom
Conversation
c914d16 to
ed6648b
Compare
Author
|
CI failures are all infrastructure flakes on RoCE workers, unrelated to the inline send change:
All are pre-existing IB/RoCE timeout patterns on swx-rain03. Retriggering CI. |
9a39731 to
ed6648b
Compare
When CUDA memory domains are loaded, the memory type cache becomes non-empty after any GPU allocation. Previously, ucp_proto_is_inline() would conservatively disable inline (am_short) sends for all buffers when the cache was non-empty, unless the user explicitly set the memory type to HOST. This caused a performance regression for host memory buffers on systems with CUDA/ROCm installed. Fix by performing a memtype cache lookup when the cache is non-empty to positively identify whether the buffer is host memory. If the address is not found in the cache, it is host memory and inline send is safe to use. Fixes openucx#4275
ed6648b to
39939fb
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ucp_proto_is_inline()conservatively disabled inline (am_short) sends for all buffers once the memory type cache became non-empty, even for host memory buffers.Test plan