[Clang] Fix 'gpuintrin.h' match when included with no arch set #129927

jhuber6 · 2025-03-05T20:06:57Z

Summary:
These require +ptx features to be set even though they're guarded by
the __nvvm_reflect. Rather than figure out how to hack around that
with the target attribute I'm just going to disable it for 'generic'
builds and use the slow version for now.

llvmbot · 2025-03-05T20:07:29Z

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-x86

Author: Joseph Huber (jhuber6)

Changes

Summary:
These require +ptx features to be set even though they're guarded by
the __nvvm_reflect. Rather than figure out how to hack around that
with the target attribute I'm just going to disable it for 'generic'
builds and use the slow version for now.

Full diff: https://github.com/llvm/llvm-project/pull/129927.diff

1 Files Affected:

(modified) clang/lib/Headers/nvptxintrin.h (+9-1)

diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..b2c3097a464fe 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -179,8 +179,10 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t __idx, uint64_t __x,
 _DEFAULT_FN_ATTRS static __inline__ uint64_t
 __gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
   // Newer targets can use the dedicated CUDA support.
-  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+#if __CUDA_ARCH__ >= 700
+  if (__nvvm_reflect("__CUDA_ARCH") >= 700)
     return __nvvm_match_any_sync_i32(__lane_mask, __x);
+#endif
 
   uint32_t __match_mask = 0;
   bool __done = 0;
@@ -200,8 +202,10 @@ __gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
 _DEFAULT_FN_ATTRS static __inline__ uint64_t
 __gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
   // Newer targets can use the dedicated CUDA support.
+#if __CUDA_ARCH__ >= 700
   if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
     return __nvvm_match_any_sync_i64(__lane_mask, __x);
+#endif
 
   uint64_t __match_mask = 0;
 
@@ -223,9 +227,11 @@ __gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
 _DEFAULT_FN_ATTRS static __inline__ uint64_t
 __gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
   // Newer targets can use the dedicated CUDA support.
+#if __CUDA_ARCH__ >= 700
   int predicate;
   if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
     return __nvvm_match_all_sync_i32p(__lane_mask, __x, &predicate);
+#endif
 
   uint32_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
   uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
@@ -236,9 +242,11 @@ __gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
 _DEFAULT_FN_ATTRS static __inline__ uint64_t
 __gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
   // Newer targets can use the dedicated CUDA support.
+#if __CUDA_ARCH__ >= 700
   int predicate;
   if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
     return __nvvm_match_all_sync_i64p(__lane_mask, __x, &predicate);
+#endif
 
   uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
   uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);

Artem-B · 2025-03-06T00:31:09Z

clang/lib/Headers/nvptxintrin.h

  // Newer targets can use the dedicated CUDA support.
-  if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+#if __CUDA_ARCH__ >= 700
+  if (__nvvm_reflect("__CUDA_ARCH") >= 700)


Is there a particular reason that __CUDA_ARCH__ got removed from if here but not in other places?

Also, if we're already checking for __CUDA_ARCH__ on preprocessor level, is there any point to use __nvvm_reflect() at all?

IIUIC, the original code was intended to be compiled into "generic" IR, and it had to rely on __nvvm_reflect() to do things differently if/when it eventually ends up targeting a newer GPU, but now that the check is made up front, I'm not quite sure that we need it any more.

Probably just because I forgot to but it was a no-op so I didn't notice.

And yeah, I wish there was a way to let nvvm_reflect call these without the incurring the wrath of the compiler. You can spoof some target attributes but that's always broken when inlined.

arsenm

Should drop the nvvm reflect here. Really shouldn't have any subtarget dependent code here. Injecting implementation details into the source program is part of the fundamental issue with device lib linking

Summary: These require `+ptx` features to be set even though they're guarded by the `__nvvm_reflect`. Rather than figure out how to hack around that with the `target` attribute I'm just going to disable it for 'generic' builds and use the slow version for now.

jhuber6 requested review from Artem-B, arsenm and shiltian March 5, 2025 20:06

llvmbot added clang Clang issues not falling into any other category backend:X86 clang:headers Headers provided by Clang, e.g. for intrinsics labels Mar 5, 2025

Artem-B reviewed Mar 6, 2025

View reviewed changes

arsenm reviewed Mar 6, 2025

View reviewed changes

jhuber6 force-pushed the AsmMatch branch from ceba17a to 5c65aaa Compare March 6, 2025 15:38

arsenm approved these changes Mar 7, 2025

View reviewed changes

jhuber6 merged commit ca39a02 into llvm:main Mar 7, 2025
11 checks passed

jhuber6 deleted the AsmMatch branch March 7, 2025 01:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Clang] Fix 'gpuintrin.h' match when included with no arch set #129927

[Clang] Fix 'gpuintrin.h' match when included with no arch set #129927

Uh oh!

jhuber6 commented Mar 5, 2025

Uh oh!

llvmbot commented Mar 5, 2025 •

edited

Loading

Uh oh!

Artem-B Mar 6, 2025

Uh oh!

Artem-B Mar 6, 2025

Uh oh!

jhuber6 Mar 6, 2025

Uh oh!

arsenm left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Clang] Fix 'gpuintrin.h' match when included with no arch set #129927

[Clang] Fix 'gpuintrin.h' match when included with no arch set #129927

Uh oh!

Conversation

jhuber6 commented Mar 5, 2025

Uh oh!

llvmbot commented Mar 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

Artem-B Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 Mar 6, 2025

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

llvmbot commented Mar 5, 2025 •

edited

Loading