Skip to content

Conversation

@jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Mar 3, 2025

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Mar 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

@llvm/pr-subscribers-backend-x86

Author: Joseph Huber (jhuber6)

Changes

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.


Full diff: https://github.com/llvm/llvm-project/pull/129560.diff

2 Files Affected:

  • (modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
  • (modified) clang/lib/Headers/nvptxintrin.h (+2-1)
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 355e75d0b2d42..6ad8e54f4aadd 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -121,7 +121,7 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
   uint32_t __hi = (uint32_t)(__x >> 32ull);
   uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
   return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
-         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..03594dd9bd6cb 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -131,7 +131,8 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
                                              __gpu_num_lanes() - 1)
           << 32ull) |
          ((uint64_t)__nvvm_shfl_sync_idx_i32(__mask, __lo, __id,
-                                             __gpu_num_lanes() - 1));
+                                             __gpu_num_lanes() - 1) &
+          0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.

@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Joseph Huber (jhuber6)

Changes

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.


Full diff: https://github.com/llvm/llvm-project/pull/129560.diff

2 Files Affected:

  • (modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
  • (modified) clang/lib/Headers/nvptxintrin.h (+2-1)
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 355e75d0b2d42..6ad8e54f4aadd 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -121,7 +121,7 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
   uint32_t __hi = (uint32_t)(__x >> 32ull);
   uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
   return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
-         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..03594dd9bd6cb 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -131,7 +131,8 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
                                              __gpu_num_lanes() - 1)
           << 32ull) |
          ((uint64_t)__nvvm_shfl_sync_idx_i32(__mask, __lo, __id,
-                                             __gpu_num_lanes() - 1));
+                                             __gpu_num_lanes() - 1) &
+          0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.

@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

@llvm/pr-subscribers-clang

Author: Joseph Huber (jhuber6)

Changes

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.


Full diff: https://github.com/llvm/llvm-project/pull/129560.diff

2 Files Affected:

  • (modified) clang/lib/Headers/amdgpuintrin.h (+1-1)
  • (modified) clang/lib/Headers/nvptxintrin.h (+2-1)
diff --git a/clang/lib/Headers/amdgpuintrin.h b/clang/lib/Headers/amdgpuintrin.h
index 355e75d0b2d42..6ad8e54f4aadd 100644
--- a/clang/lib/Headers/amdgpuintrin.h
+++ b/clang/lib/Headers/amdgpuintrin.h
@@ -121,7 +121,7 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
   uint32_t __hi = (uint32_t)(__x >> 32ull);
   uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
   return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
-         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
+         ((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..03594dd9bd6cb 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -131,7 +131,8 @@ __gpu_read_first_lane_u64(uint64_t __lane_mask, uint64_t __x) {
                                              __gpu_num_lanes() - 1)
           << 32ull) |
          ((uint64_t)__nvvm_shfl_sync_idx_i32(__mask, __lo, __id,
-                                             __gpu_num_lanes() - 1));
+                                             __gpu_num_lanes() - 1) &
+          0xFFFFFFFF);
 }
 
 // Returns a bitmask of threads in the current lane for which \p x is true.

uint32_t __lo = (uint32_t)(__x & 0xFFFFFFFF);
return ((uint64_t)__builtin_amdgcn_readfirstlane(__hi) << 32ull) |
((uint64_t)__builtin_amdgcn_readfirstlane(__lo));
((uint64_t)__builtin_amdgcn_readfirstlane(__lo) & 0xFFFFFFFF);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just cast to uint32_t?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also the builtin really should have been unsigned to begin with

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are being combined, so they both should be 64-bit. I could cast to unsigned, then cast to unsigned 64-bit, but I think explicitly masking is clearer here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also clamping the thing you just clamped 2 lines above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The clamp above likely isn't necessary, as a cast to unsigned will probably truncate, that's probably a no-op after optimizations. But these are different, the input is unsigned but the output is signed, so we need to clamp it again.

Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should really fix the builtin. It could accept any legal type, but failing that it should just be uint32_t

@github-project-automation github-project-automation bot moved this from Needs Triage to Needs Merge in LLVM Release Status Mar 3, 2025
@jhuber6
Copy link
Contributor Author

jhuber6 commented Mar 3, 2025

We should really fix the builtin. It could accept any legal type, but failing that it should just be uint32_t

Yeah it would simplify the code if we could just pass a pointer or whatever. We could probably also do some recognition in the compiler to stitch these kinds of manual splits back together into a single type.

@jhuber6 jhuber6 merged commit 4ca8ea8 into llvm:main Mar 3, 2025
16 checks passed
@github-project-automation github-project-automation bot moved this from Needs Merge to Done in LLVM Release Status Mar 3, 2025
@jhuber6 jhuber6 deleted the FixSign branch March 3, 2025 20:26
@jhuber6
Copy link
Contributor Author

jhuber6 commented Mar 3, 2025

/cherry-pick 4ca8ea8

@llvmbot
Copy link
Member

llvmbot commented Mar 3, 2025

/pull-request #129587

swift-ci pushed a commit to swiftlang/llvm-project that referenced this pull request Mar 12, 2025
…9560)

Summary:
These return values are actually signed, meaning that casting will
extend it and then all the bits will be one.

(cherry picked from commit 4ca8ea8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category

Projects

Development

Successfully merging this pull request may close these issues.

3 participants