-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[Clang] Fix 'gpuintrin.h' match when included with no arch set #129927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-clang @llvm/pr-subscribers-backend-x86 Author: Joseph Huber (jhuber6) ChangesSummary: Full diff: https://github.com/llvm/llvm-project/pull/129927.diff 1 Files Affected:
diff --git a/clang/lib/Headers/nvptxintrin.h b/clang/lib/Headers/nvptxintrin.h
index 29d0adcabc82f..b2c3097a464fe 100644
--- a/clang/lib/Headers/nvptxintrin.h
+++ b/clang/lib/Headers/nvptxintrin.h
@@ -179,8 +179,10 @@ __gpu_shuffle_idx_u64(uint64_t __lane_mask, uint32_t __idx, uint64_t __x,
_DEFAULT_FN_ATTRS static __inline__ uint64_t
__gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
// Newer targets can use the dedicated CUDA support.
- if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
+#if __CUDA_ARCH__ >= 700
+ if (__nvvm_reflect("__CUDA_ARCH") >= 700)
return __nvvm_match_any_sync_i32(__lane_mask, __x);
+#endif
uint32_t __match_mask = 0;
bool __done = 0;
@@ -200,8 +202,10 @@ __gpu_match_any_u32(uint64_t __lane_mask, uint32_t __x) {
_DEFAULT_FN_ATTRS static __inline__ uint64_t
__gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
// Newer targets can use the dedicated CUDA support.
+#if __CUDA_ARCH__ >= 700
if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
return __nvvm_match_any_sync_i64(__lane_mask, __x);
+#endif
uint64_t __match_mask = 0;
@@ -223,9 +227,11 @@ __gpu_match_any_u64(uint64_t __lane_mask, uint64_t __x) {
_DEFAULT_FN_ATTRS static __inline__ uint64_t
__gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
// Newer targets can use the dedicated CUDA support.
+#if __CUDA_ARCH__ >= 700
int predicate;
if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
return __nvvm_match_all_sync_i32p(__lane_mask, __x, &predicate);
+#endif
uint32_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
@@ -236,9 +242,11 @@ __gpu_match_all_u32(uint64_t __lane_mask, uint32_t __x) {
_DEFAULT_FN_ATTRS static __inline__ uint64_t
__gpu_match_all_u64(uint64_t __lane_mask, uint64_t __x) {
// Newer targets can use the dedicated CUDA support.
+#if __CUDA_ARCH__ >= 700
int predicate;
if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700)
return __nvvm_match_all_sync_i64p(__lane_mask, __x, &predicate);
+#endif
uint64_t __first = __gpu_read_first_lane_u64(__lane_mask, __x);
uint64_t __ballot = __gpu_ballot(__lane_mask, __x == __first);
|
clang/lib/Headers/nvptxintrin.h
Outdated
| // Newer targets can use the dedicated CUDA support. | ||
| if (__CUDA_ARCH__ >= 700 || __nvvm_reflect("__CUDA_ARCH") >= 700) | ||
| #if __CUDA_ARCH__ >= 700 | ||
| if (__nvvm_reflect("__CUDA_ARCH") >= 700) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason that __CUDA_ARCH__ got removed from if here but not in other places?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, if we're already checking for __CUDA_ARCH__ on preprocessor level, is there any point to use __nvvm_reflect() at all?
IIUIC, the original code was intended to be compiled into "generic" IR, and it had to rely on __nvvm_reflect() to do things differently if/when it eventually ends up targeting a newer GPU, but now that the check is made up front, I'm not quite sure that we need it any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably just because I forgot to but it was a no-op so I didn't notice.
And yeah, I wish there was a way to let nvvm_reflect call these without the incurring the wrath of the compiler. You can spoof some target attributes but that's always broken when inlined.
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should drop the nvvm reflect here. Really shouldn't have any subtarget dependent code here. Injecting implementation details into the source program is part of the fundamental issue with device lib linking
Summary: These require `+ptx` features to be set even though they're guarded by the `__nvvm_reflect`. Rather than figure out how to hack around that with the `target` attribute I'm just going to disable it for 'generic' builds and use the slow version for now.
Summary:
These require
+ptxfeatures to be set even though they're guarded bythe
__nvvm_reflect. Rather than figure out how to hack around thatwith the
targetattribute I'm just going to disable it for 'generic'builds and use the slow version for now.