Skip to content

Conversation

@jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented May 6, 2025

Summary:
We use the ballot to get the proper lane mask after we've masked off the
threads already done. This isn't an issue on AMDGPU but could cause
problems for post-Volta since it's saying that threads are active when
they aren't.

Summary:
We use the ballot to get the proper lane mask after we've masked off the
threads already done. This isn't an issue on AMDGPU but could cause
problems for post-Volta since it's saying that threads are active when
they aren't.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels May 6, 2025
@llvmbot
Copy link
Member

llvmbot commented May 6, 2025

@llvm/pr-subscribers-backend-x86

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
We use the ballot to get the proper lane mask after we've masked off the
threads already done. This isn't an issue on AMDGPU but could cause
problems for post-Volta since it's saying that threads are active when
they aren't.


Full diff: https://github.com/llvm/llvm-project/pull/138693.diff

1 Files Affected:

  • (modified) clang/lib/Headers/gpuintrin.h (+6-4)
diff --git a/clang/lib/Headers/gpuintrin.h b/clang/lib/Headers/gpuintrin.h
index d308cc959be84..7afc82413996b 100644
--- a/clang/lib/Headers/gpuintrin.h
+++ b/clang/lib/Headers/gpuintrin.h
@@ -264,9 +264,10 @@ __gpu_match_any_u32_impl(uint64_t __lane_mask, uint32_t __x) {
   uint64_t __match_mask = 0;
 
   bool __done = 0;
-  while (__gpu_ballot(__lane_mask, !__done)) {
+  for (uint64_t __active_mask = __lane_mask; __active_mask;
+       __active_mask = __gpu_ballot(__lane_mask, !__done)) {
     if (!__done) {
-      uint32_t __first = __gpu_read_first_lane_u32(__lane_mask, __x);
+      uint32_t __first = __gpu_read_first_lane_u32(__active_mask, __x);
       if (__first == __x) {
         __match_mask = __gpu_lane_mask();
         __done = 1;
@@ -283,9 +284,10 @@ __gpu_match_any_u64_impl(uint64_t __lane_mask, uint64_t __x) {
   uint64_t __match_mask = 0;
 
   bool __done = 0;
-  while (__gpu_ballot(__lane_mask, !__done)) {
+  for (uint64_t __active_mask = __lane_mask; __active_mask;
+       __active_mask = __gpu_ballot(__lane_mask, !__done)) {
     if (!__done) {
-      uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
+      uint64_t __first = __gpu_read_first_lane_u64(__active_mask, __x);
       if (__first == __x) {
         __match_mask = __gpu_lane_mask();
         __done = 1;

@jhuber6 jhuber6 merged commit 05d6734 into llvm:main May 7, 2025
15 checks passed
@jhuber6 jhuber6 deleted the Mask branch May 7, 2025 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants