Skip to content

Conversation

@lhl
Copy link

@lhl lhl commented Aug 11, 2025

I ran into issues compiling w/ -DGGML_HIP_ROCWMMA_FATTN=ON with the latest TheRock/ROCm (7.0) nightly releases. This appears to be due to a warp mask width incompatibility recently introduced between ROCm's rocWMMA library and CUDA-style sync code.

ROCm's rocWMMA library recently added its own __shfl_sync and __shfl_xor_sync functions but it also requires 64-bit masks while the existing code uses hardcoded 32-bit masks (0xFFFFFFFF). This causes type conflicts and compilation failures when building with rocWMMA support enabled.

  • Added ifndef to hip.h for the sync functions
  • Added GGML_CUDA_WARP_MASK macro in ggml-cuda/common.cuh and ggml-cuda/vendors/hip.h
  • Replaced all hardcoded warp masks with the new macro across CUDA files
  • tested builds on CUDA and HIP

I tried fixing w/ just messing w/ etc just in hip.h and leaving the CUDA files alone but I think there really isn't a clean alternative of replacing the hard-coded masks?

@lhl lhl requested a review from JohannesGaessler as a code owner August 11, 2025 12:56
@github-actions github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Aug 11, 2025
@JohannesGaessler
Copy link
Collaborator

When I grep rocWMMA for "shfl" I am not finding anything.

@IMbackK
Copy link
Collaborator

IMbackK commented Aug 11, 2025

dup of #15241 please see discussion in that pr.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants