-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Fix HIP warp synchronization function conflicts for ROCm 7.0+ #15241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix HIP warp synchronization function conflicts for ROCm 7.0+ #15241
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes no sense.
The rocm native __shfl_*_sync
functions from amd_warp_sync_functions.h have always used 64bit masks and have always very rightly failed you for passing in a 32bit type.
But we dont use those, we instead #define
the __shfl_*_sync
functions to the __shfl
equivalents, which have no mask.
It makes zero sense why rocwmma would make a difference here other than that the inclusion of its header is somehow causing the shfl functions from amd_warp_sync_functions.h to be used instead.
While switching over to using the __shfl_*_sync
from amd_warp_sync_functions.h might look attractive from the surface, it makes little sense for us right now, as they also just call __shfl
see https://github.com/ROCm/clr/blob/7f77ceeaf9317261039d8f88b5748cb514a88df9/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L298 and where added only in rocm 6.2, while we still need to support at least rocm 6.1 as this is what is in Debian stable.
This pr is going to be a nak from me, please identify how the wrong versions of the shfl functions are being used when built with rocwmma instead.
…C_BUILTINS has been replaced with HIP_DISABLE_WARP_SYNC_BUILTINS (https://github.com/ROCm/clr/blob/rocm-6.4.x-with-7.0-preview/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L30)
Thanks @IMbackK After deeper investigation, it doesn't have anything related to "new" rocWMMA or some "ROCm 7.0 warp mask breaking changes". The real issue here is macro change in https://github.com/ROCm/clr/blob/rocm-6.4.x-with-7.0-preview/hipamd/include/hip/amd_detail/amd_warp_sync_functions.h#L30 ![]() |
Yes the change in the behavior in amd_warp_sync_functions.h indeed is the cause of this issue, but this solution is still overly complicated, we dont need to handle separate cases here for different rocm versions. i would suggest this: #15273 as we dont make use of any of the functions this header defines anywhere in ggml |
@IMbackK I wanted to have "more obvious" backward compatible changes since HIP_ENABLE_WARP_SYNC_BUILTINS macro was introduced in hip.h file earlier and this macro is relevant for ROCm 6.4 and older but not present in ROCm 7.0. Since you deleted define/undef HIP_ENABLE_WARP_SYNC_BUILTINS and just define HIP_DISABLE_WARP_SYNC_BUILTINS which is present only in ROCm 6.5+ your solution is much cleaner and backward compatible 👍 |
…atibility for ROCm 7.0+ due to requests and comments from ggml-org#15241
Summary
This PR should fix warp synchronization function conflicts that occur when rocWMMA headers are included
Problem
ROCm includes native warp synchronization builtin functions (
__shfl_sync, __shfl_xor_sync
, etc.) that require 64-bit masks and have been available since ROCm 6.2. However, when rocWMMA headers are included, they pull in these native functions which conflict with the existing compatibility shims that map to the maskless __shfl equivalents.The conflict occurs because:
hip_fp16.h
) includeamd_warp_sync_functions.h
__shfl_*_sync
functions expecting 64-bit masks0xffffffff
) causing compilation failures