UCT/IB/MLX5: Fix DEVX for not WC-supported UAR#11310
Merged
gleon99 merged 2 commits intoopenucx:masterfrom Apr 9, 2026
Merged
Conversation
Fix two separate bugs related to UCX assuming UAR pages of HCA always supporting WC (write combining) and not checking UCT_IB_MLX5_MD_FLAG_UAR_USE_WC flag : - Doorbell path hardcoded bf_size=256, forcing BF mode even if UAR is NC-mapped - Device memory was allocated and copied data via MMIO writes even if it was not supported (because it requires WC)
roiedanino
reviewed
Mar 31, 2026
6ad240d to
ce774ef
Compare
gleon99
reviewed
Mar 31, 2026
gleon99
approved these changes
Mar 31, 2026
Contributor
|
@Artemy-Mellanox can you pls review? |
Artemy-Mellanox
approved these changes
Apr 4, 2026
roiedanino
approved these changes
Apr 5, 2026
Collaborator
Author
@yosefe we cannot simulate a non WC environment in a gtest running on WC-capable hardware. We can use a sw config parameter to override UCT_IB_MLX5_MD_FLAG_UAR_USE_WC value but it would not matter since the hardware behavior can't be simulate. In order to catch any future WC-support related bugs in the failure we would need to use an actual non WC environment in CI. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What?
Fix two separate bugs related to UCX assuming UAR pages of HCA always supporting WC (write combining) and not checking
UCT_IB_MLX5_MD_FLAG_UAR_USE_WC flag :
Why?
https://redmine.mellanox.com/projects/8507/issues/4908319
Gtest failure on hana partition, due to PCI BAR size configuration of the HCA being too small, and as a result not supporting WC
How?
In both doorbell path and DM init path, check flag UCT_IB_MLX5_MD_FLAG_UAR_USE_WC to ensure WC is supported