hip : fix warp mask width for rocWMMA compatibility #15239
Closed
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I ran into issues compiling w/
-DGGML_HIP_ROCWMMA_FATTN=ON
with the latest TheRock/ROCm (7.0) nightly releases. This appears to be due to a warp mask width incompatibility recently introduced between ROCm's rocWMMA library and CUDA-style sync code.ROCm's rocWMMA library recently added its own __shfl_sync and __shfl_xor_sync functions but it also requires 64-bit masks while the existing code uses hardcoded 32-bit masks (0xFFFFFFFF). This causes type conflicts and compilation failures when building with rocWMMA support enabled.
I tried fixing w/ just messing w/ etc just in hip.h and leaving the CUDA files alone but I think there really isn't a clean alternative of replacing the hard-coded masks?