Skip to content

hip : fix warp mask width for rocWMMA compatibility #15239

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

lhl
Copy link

@lhl lhl commented Aug 11, 2025

I ran into issues compiling w/ -DGGML_HIP_ROCWMMA_FATTN=ON with the latest TheRock/ROCm (7.0) nightly releases. This appears to be due to a warp mask width incompatibility recently introduced between ROCm's rocWMMA library and CUDA-style sync code.

ROCm's rocWMMA library recently added its own __shfl_sync and __shfl_xor_sync functions but it also requires 64-bit masks while the existing code uses hardcoded 32-bit masks (0xFFFFFFFF). This causes type conflicts and compilation failures when building with rocWMMA support enabled.

  • Added ifndef to hip.h for the sync functions
  • Added GGML_CUDA_WARP_MASK macro in ggml-cuda/common.cuh and ggml-cuda/vendors/hip.h
  • Replaced all hardcoded warp masks with the new macro across CUDA files
  • tested builds on CUDA and HIP

I tried fixing w/ just messing w/ etc just in hip.h and leaving the CUDA files alone but I think there really isn't a clean alternative of replacing the hard-coded masks?

@lhl lhl requested a review from JohannesGaessler as a code owner August 11, 2025 12:56
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 11, 2025
@JohannesGaessler
Copy link
Collaborator

When I grep rocWMMA for "shfl" I am not finding anything.

@IMbackK
Copy link
Collaborator

IMbackK commented Aug 11, 2025

dup of #15241 please see discussion in that pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants