[Clang] Fix GPU intrinsic helpers incorrectly sign extending #129560

jhuber6 · 2025-03-03T17:23:18Z

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.

Summary: These return values are actually signed, meaning that casting will extend it and then all the bits will be one.

llvmbot · 2025-03-03T17:23:50Z

@llvm/pr-subscribers-backend-x86

Author: Joseph Huber (jhuber6)

Changes

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.

Full diff: https://github.com/llvm/llvm-project/pull/129560.diff

2 Files Affected:

(modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
(modified) clang/lib/Headers/nvptxintrin.h (+2-1)

diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 355e75d0b2d42..6ad8e54f4aadd 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -121,7 +121,7 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
   uint32_t __hi = (uint32_t)(__x >> 32ull);
   uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
   return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
-         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..03594dd9bd6cb 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -131,7 +131,8 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
                                              __gpu_num_lanes() - 1)
           << 32ull) |
          ((uint64_t)__nvvm_shfl_sync_idx_i32(__mask, __lo, __id,
-                                             __gpu_num_lanes() - 1));
+                                             __gpu_num_lanes() - 1) &
+          0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.

llvmbot · 2025-03-03T17:23:51Z

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.

Full diff: https://github.com/llvm/llvm-project/pull/129560.diff

2 Files Affected:

(modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
(modified) clang/lib/Headers/nvptxintrin.h (+2-1)

diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 355e75d0b2d42..6ad8e54f4aadd 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -121,7 +121,7 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
   uint32_t __hi = (uint32_t)(__x >> 32ull);
   uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
   return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
-         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..03594dd9bd6cb 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -131,7 +131,8 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
                                              __gpu_num_lanes() - 1)
           << 32ull) |
          ((uint64_t)__nvvm_shfl_sync_idx_i32(__mask, __lo, __id,
-                                             __gpu_num_lanes() - 1));
+                                             __gpu_num_lanes() - 1) &
+          0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.

llvmbot · 2025-03-03T17:23:51Z

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.

Full diff: https://github.com/llvm/llvm-project/pull/129560.diff

2 Files Affected:

(modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
(modified) clang/lib/Headers/nvptxintrin.h (+2-1)

diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 355e75d0b2d42..6ad8e54f4aadd 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -121,7 +121,7 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
   uint32_t __hi = (uint32_t)(__x >> 32ull);
   uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
   return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
-         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..03594dd9bd6cb 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -131,7 +131,8 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
                                              __gpu_num_lanes() - 1)
           << 32ull) |
          ((uint64_t)__nvvm_shfl_sync_idx_i32(__mask, __lo, __id,
-                                             __gpu_num_lanes() - 1));
+                                             __gpu_num_lanes() - 1) &
+          0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.

arsenm · 2025-03-03T17:24:27Z

clang/lib/Headers/amdgpuintrin.h

  uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
  return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
-         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);


why not just cast to uint32_t?

also the builtin really should have been unsigned to begin with

These are being combined, so they both should be 64-bit. I could cast to unsigned, then cast to unsigned 64-bit, but I think explicitly masking is clearer here.

this is also clamping the thing you just clamped 2 lines above

The clamp above likely isn't necessary, as a cast to unsigned will probably truncate, that's probably a no-op after optimizations. But these are different, the input is unsigned but the output is signed, so we need to clamp it again.

arsenm

We should really fix the builtin. It could accept any legal type, but failing that it should just be uint32_t

jhuber6 · 2025-03-03T17:49:25Z

We should really fix the builtin. It could accept any legal type, but failing that it should just be uint32_t

Yeah it would simplify the code if we could just pass a pointer or whatever. We could probably also do some recognition in the compiler to stitch these kinds of manual splits back together into a single type.

jhuber6 · 2025-03-03T20:26:55Z

/cherry-pick 4ca8ea8

llvmbot · 2025-03-03T20:32:35Z

/pull-request #129587

…9560) Summary: These return values are actually signed, meaning that casting will extend it and then all the bits will be one. (cherry picked from commit 4ca8ea8)

[Clang] Fix GPU intrinsic helpers incorrectly sign extending

d7df493

Summary: These return values are actually signed, meaning that casting will extend it and then all the bits will be one.

jhuber6 requested review from JonChesterfield, arsenm, jplehr and shiltian March 3, 2025 17:23

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Mar 3, 2025

arsenm reviewed Mar 3, 2025

View reviewed changes

jhuber6 added this to the LLVM 20.X Release milestone Mar 3, 2025

github-project-automation bot added this to LLVM Release Status Mar 3, 2025

github-project-automation bot moved this to Needs Triage in LLVM Release Status Mar 3, 2025

arsenm approved these changes Mar 3, 2025

View reviewed changes

github-project-automation bot moved this from Needs Triage to Needs Merge in LLVM Release Status Mar 3, 2025

jhuber6 merged commit 4ca8ea8 into llvm:main Mar 3, 2025
16 checks passed

github-project-automation bot moved this from Needs Merge to Done in LLVM Release Status Mar 3, 2025

jhuber6 deleted the FixSign branch March 3, 2025 20:26

[Clang] Fix GPU intrinsic helpers incorrectly sign extending #129560

[Clang] Fix GPU intrinsic helpers incorrectly sign extending #129560

Uh oh!

Conversation

jhuber6 commented Mar 3, 2025

Uh oh!

llvmbot commented Mar 3, 2025

Uh oh!

llvmbot commented Mar 3, 2025

Uh oh!

llvmbot commented Mar 3, 2025

Uh oh!

arsenm Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 Mar 3, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

jhuber6 commented Mar 3, 2025

Uh oh!

Uh oh!

jhuber6 commented Mar 3, 2025

Uh oh!

llvmbot commented Mar 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants