Skip to content

Conversation

sarnex
Copy link
Member

@sarnex sarnex commented Feb 7, 2025

Reland #121839 based on the results of the Discourse discussion here.

Copy link

github-actions bot commented Feb 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@sarnex sarnex marked this pull request as ready for review February 10, 2025 15:19
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Feb 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Feb 10, 2025

@llvm/pr-subscribers-clang

Author: Nick Sarnie (sarnex)

Changes

As a follow-up to #121839, where we wanted to make __has_builtin return false for aux builtins, but that broke existing code.

Instead, introduce a new macro __has_target_builtin (name open to suggestions) that only considers builtins for the current target.


Full diff: https://github.com/llvm/llvm-project/pull/126324.diff

4 Files Affected:

  • (modified) clang/docs/LanguageExtensions.rst (+33)
  • (modified) clang/include/clang/Lex/Preprocessor.h (+1)
  • (modified) clang/lib/Lex/PPMacroExpansion.cpp (+58-52)
  • (added) clang/test/Preprocessor/has_target_builtin.cpp (+18)
diff --git a/clang/docs/LanguageExtensions.rst b/clang/docs/LanguageExtensions.rst
index 973cf8f9d091c30..057ad564f970bb4 100644
--- a/clang/docs/LanguageExtensions.rst
+++ b/clang/docs/LanguageExtensions.rst
@@ -67,6 +67,10 @@ It can be used like this:
   ``__has_builtin`` should not be used to detect support for a builtin macro;
   use ``#ifdef`` instead.
 
+  When using device offloading, a builtin is considered available if it is
+  available on either the host or the device targets.
+  Use ``__has_target_builtin`` to consider only the current target.
+
 ``__has_constexpr_builtin``
 ---------------------------
 
@@ -96,6 +100,35 @@ the ``<cmath>`` header file to conditionally make a function constexpr whenever
 the constant evaluation of the corresponding builtin (for example,
 ``std::fmax`` calls ``__builtin_fmax``) is supported in Clang.
 
+``__has_target_builtin``
+------------------------
+
+This function-like macro takes a single identifier argument that is the name of
+a builtin function, a builtin pseudo-function (taking one or more type
+arguments), or a builtin template.
+It evaluates to 1 if the builtin is supported on the current target or 0 if not.
+The behavior is different than ``__has_builtin`` when there is an auxiliary target,
+such when offloading to a target device.
+It can be used like this:
+
+.. code-block:: c++
+
+  #ifndef __has_target_builtin         // Optional of course.
+    #define __has_target_builtin(x) 0  // Compatibility with non-clang compilers.
+  #endif
+
+  ...
+  #if __has_target_builtin(__builtin_trap)
+    __builtin_trap();
+  #else
+    abort();
+  #endif
+  ...
+
+.. note::
+  ``__has_target_builtin`` should not be used to detect support for a builtin macro;
+  use ``#ifdef`` instead.
+
 .. _langext-__has_feature-__has_extension:
 
 ``__has_feature`` and ``__has_extension``
diff --git a/clang/include/clang/Lex/Preprocessor.h b/clang/include/clang/Lex/Preprocessor.h
index 2bf4d1a16699430..240fe28aba93e33 100644
--- a/clang/include/clang/Lex/Preprocessor.h
+++ b/clang/include/clang/Lex/Preprocessor.h
@@ -174,6 +174,7 @@ class Preprocessor {
   IdentifierInfo *Ident__has_extension;            // __has_extension
   IdentifierInfo *Ident__has_builtin;              // __has_builtin
   IdentifierInfo *Ident__has_constexpr_builtin;    // __has_constexpr_builtin
+  IdentifierInfo *Ident__has_target_builtin;       // __has_target_builtin
   IdentifierInfo *Ident__has_attribute;            // __has_attribute
   IdentifierInfo *Ident__has_embed;                // __has_embed
   IdentifierInfo *Ident__has_include;              // __has_include
diff --git a/clang/lib/Lex/PPMacroExpansion.cpp b/clang/lib/Lex/PPMacroExpansion.cpp
index 347c13da0ad215a..23a693b105fca3a 100644
--- a/clang/lib/Lex/PPMacroExpansion.cpp
+++ b/clang/lib/Lex/PPMacroExpansion.cpp
@@ -357,6 +357,7 @@ void Preprocessor::RegisterBuiltinMacros() {
   Ident__has_builtin = RegisterBuiltinMacro("__has_builtin");
   Ident__has_constexpr_builtin =
       RegisterBuiltinMacro("__has_constexpr_builtin");
+  Ident__has_target_builtin = RegisterBuiltinMacro("__has_target_builtin");
   Ident__has_attribute = RegisterBuiltinMacro("__has_attribute");
   if (!getLangOpts().CPlusPlus)
     Ident__has_c_attribute = RegisterBuiltinMacro("__has_c_attribute");
@@ -1797,55 +1798,62 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
                                            diag::err_feature_check_malformed);
         return II && HasExtension(*this, II->getName());
       });
-  } else if (II == Ident__has_builtin) {
-    EvaluateFeatureLikeBuiltinMacro(OS, Tok, II, *this, false,
-      [this](Token &Tok, bool &HasLexedNextToken) -> int {
-        IdentifierInfo *II = ExpectFeatureIdentifierInfo(Tok, *this,
-                                           diag::err_feature_check_malformed);
-        if (!II)
-          return false;
-        else if (II->getBuiltinID() != 0) {
-          switch (II->getBuiltinID()) {
-          case Builtin::BI__builtin_cpu_is:
-            return getTargetInfo().supportsCpuIs();
-          case Builtin::BI__builtin_cpu_init:
-            return getTargetInfo().supportsCpuInit();
-          case Builtin::BI__builtin_cpu_supports:
-            return getTargetInfo().supportsCpuSupports();
-          case Builtin::BI__builtin_operator_new:
-          case Builtin::BI__builtin_operator_delete:
-            // denotes date of behavior change to support calling arbitrary
-            // usual allocation and deallocation functions. Required by libc++
-            return 201802;
-          default:
-            return Builtin::evaluateRequiredTargetFeatures(
-                getBuiltinInfo().getRequiredFeatures(II->getBuiltinID()),
-                getTargetInfo().getTargetOpts().FeatureMap);
+  } else if (II == Ident__has_builtin || II == Ident__has_target_builtin) {
+    bool IsHasTargetBuiltin = II == Ident__has_target_builtin;
+    EvaluateFeatureLikeBuiltinMacro(
+        OS, Tok, II, *this, false,
+        [this, IsHasTargetBuiltin](Token &Tok, bool &HasLexedNextToken) -> int {
+          IdentifierInfo *II = ExpectFeatureIdentifierInfo(
+              Tok, *this, diag::err_feature_check_malformed);
+          if (!II)
+            return false;
+          auto BuiltinID = II->getBuiltinID();
+          if (BuiltinID != 0) {
+            switch (BuiltinID) {
+            case Builtin::BI__builtin_cpu_is:
+              return getTargetInfo().supportsCpuIs();
+            case Builtin::BI__builtin_cpu_init:
+              return getTargetInfo().supportsCpuInit();
+            case Builtin::BI__builtin_cpu_supports:
+              return getTargetInfo().supportsCpuSupports();
+            case Builtin::BI__builtin_operator_new:
+            case Builtin::BI__builtin_operator_delete:
+              // denotes date of behavior change to support calling arbitrary
+              // usual allocation and deallocation functions. Required by libc++
+              return 201802;
+            default:
+              // __has_target_builtin should return false for aux builtins.
+              if (IsHasTargetBuiltin &&
+                  getBuiltinInfo().isAuxBuiltinID(BuiltinID))
+                return false;
+              return Builtin::evaluateRequiredTargetFeatures(
+                  getBuiltinInfo().getRequiredFeatures(BuiltinID),
+                  getTargetInfo().getTargetOpts().FeatureMap);
+            }
+            return true;
+          } else if (IsBuiltinTrait(Tok)) {
+            return true;
+          } else if (II->getTokenID() != tok::identifier &&
+                     II->getName().starts_with("__builtin_")) {
+            return true;
+          } else {
+            return llvm::StringSwitch<bool>(II->getName())
+                // Report builtin templates as being builtins.
+                .Case("__make_integer_seq", getLangOpts().CPlusPlus)
+                .Case("__type_pack_element", getLangOpts().CPlusPlus)
+                .Case("__builtin_common_type", getLangOpts().CPlusPlus)
+                // Likewise for some builtin preprocessor macros.
+                // FIXME: This is inconsistent; we usually suggest detecting
+                // builtin macros via #ifdef. Don't add more cases here.
+                .Case("__is_target_arch", true)
+                .Case("__is_target_vendor", true)
+                .Case("__is_target_os", true)
+                .Case("__is_target_environment", true)
+                .Case("__is_target_variant_os", true)
+                .Case("__is_target_variant_environment", true)
+                .Default(false);
           }
-          return true;
-        } else if (IsBuiltinTrait(Tok)) {
-          return true;
-        } else if (II->getTokenID() != tok::identifier &&
-                   II->getName().starts_with("__builtin_")) {
-          return true;
-        } else {
-          return llvm::StringSwitch<bool>(II->getName())
-              // Report builtin templates as being builtins.
-              .Case("__make_integer_seq", getLangOpts().CPlusPlus)
-              .Case("__type_pack_element", getLangOpts().CPlusPlus)
-              .Case("__builtin_common_type", getLangOpts().CPlusPlus)
-              // Likewise for some builtin preprocessor macros.
-              // FIXME: This is inconsistent; we usually suggest detecting
-              // builtin macros via #ifdef. Don't add more cases here.
-              .Case("__is_target_arch", true)
-              .Case("__is_target_vendor", true)
-              .Case("__is_target_os", true)
-              .Case("__is_target_environment", true)
-              .Case("__is_target_variant_os", true)
-              .Case("__is_target_variant_environment", true)
-              .Default(false);
-        }
-      });
+        });
   } else if (II == Ident__has_constexpr_builtin) {
     EvaluateFeatureLikeBuiltinMacro(
         OS, Tok, II, *this, false,
@@ -1886,8 +1894,7 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
 
         return false;
       });
-  } else if (II == Ident__has_cpp_attribute ||
-             II == Ident__has_c_attribute) {
+  } else if (II == Ident__has_cpp_attribute || II == Ident__has_c_attribute) {
     bool IsCXX = II == Ident__has_cpp_attribute;
     EvaluateFeatureLikeBuiltinMacro(OS, Tok, II, *this, true,
         [&](Token &Tok, bool &HasLexedNextToken) -> int {
@@ -1917,8 +1924,7 @@ void Preprocessor::ExpandBuiltinMacro(Token &Tok) {
                                    getLangOpts())
                     : 0;
         });
-  } else if (II == Ident__has_include ||
-             II == Ident__has_include_next) {
+  } else if (II == Ident__has_include || II == Ident__has_include_next) {
     // The argument to these two builtins should be a parenthesized
     // file name string literal using angle brackets (<>) or
     // double-quotes ("").
diff --git a/clang/test/Preprocessor/has_target_builtin.cpp b/clang/test/Preprocessor/has_target_builtin.cpp
new file mode 100644
index 000000000000000..64b2d7e1b35d9ef
--- /dev/null
+++ b/clang/test/Preprocessor/has_target_builtin.cpp
@@ -0,0 +1,18 @@
+// RUN: %clang_cc1 -fopenmp -triple=spirv64 -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// RUN: %clang_cc1 -fopenmp -triple=nvptx64 -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// RUN: %clang_cc1 -fopenmp -triple=amdgcn-amd-amdhsa -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// RUN: %clang_cc1 -fopenmp -triple=aarch64 -fopenmp-is-target-device \
+// RUN: -aux-triple x86_64-linux-unknown -E %s | FileCheck -implicit-check-not=BAD %s
+
+// CHECK: GOOD
+#if __has_target_builtin(__builtin_ia32_pause)
+  BAD
+#else
+  GOOD
+#endif

@sarnex sarnex requested review from AlexVlx and Artem-B February 10, 2025 15:20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to define this macro for offloading languages only. The reason is that non-offloading languages do not need this macro but if they start to use this macro then it will break again in offloading languages like __has_builtin did.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the usage would commonly be

#if defined(__has_target_builtin) && __has_target_builtin(foo)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fear is that some C++ library headers start to use this macro __has_target_builtin in place of __has_builtin, and we cannot modify such headers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do this. I can't find a good way to detect offloading languages in general here, so I'm just going to check for CUDA/HIP/SYCLDevice/OpenMPDevice, let me know if there's some common logic I can rely on that I missed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest commit

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My fear is that some C++ library headers start to use this macro __has_target_builtin in place of __has_builtin, and we cannot modify such headers.

IMO, now that we do document semantics of __has_target_builtin(), its misuse on the library side will be their problem to fix. The problem with __has_builtin() was that it was never intended to handle heterogeneous compilation, and that's what created the issue when CUDA/HIP made builtins from both host and device visible to the compiler, but not all of them codegen-able. __has_target_builtin() clearly states what to expect. Sure, it's possible to misuse it, but having it available unconditionally will make it much less cumbersome to use in the headers shared between CUDA and C++, and that's a fairly common use case.

I'd prefer to have __has_target_builtin() generally available.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, __has_target_builtin() is probably identical to __has_builtin on non-offloading related things. It's up to them if they keep it portable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made it unconditional again in the latest commit

i kept the code example in langref because probably we still want to recommend only using this for offloading targets, even though it will work on non-offloading targets. let met know if you disagree.

@yxsamliu
Copy link
Collaborator

need release note

sarnex added a commit that referenced this pull request Feb 10, 2025
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Feb 10, 2025
``__has_target_builtin`` should not be used to detect support for a builtin macro;
use ``#ifdef`` instead.

``__has_target_built`` is only defined for offloading targets.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
``__has_target_built`` is only defined for offloading targets.
``__has_target_builtin`` is only defined for offloading targets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow thats embarrassing, thanks

arguments), or a builtin template.
It evaluates to 1 if the builtin is supported on the current target or 0 if not.
The behavior is different than ``__has_builtin`` when there is an auxiliary target,
such when offloading to a target device.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
such when offloading to a target device.
such as when offloading to a target device.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cant english today, thanks

Comment on lines 117 to 119
#ifndef __has_target_builtin // Optional of course.
#define __has_target_builtin(x) 0 // Compatibility with non-clang compilers.
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be more helpful to do something like ifdef CUDA ... else __has_builtin.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully the latest commit has the use case youre looking for

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we make it available to C++, we'd better document the following invalid usage which originally leads to this extension:

 #if !__has_target_builtin(__wfi) 
 static __inline__ void __attribute__((__always_inline__, __nodebug__)) __wfi(void) { 
   __builtin_arm_wfi(); 
 } 
 #endif 

we should emphasize that a C++ header may be used by offloading languages, and in offloading language, the same source is compiled for host and device target separately. A builtin not available for the current target does not justify defining the builtin for both host and device targets. In this case, better to use __has_builtin(__wfi) since it makes sure the condition is true for both hosts and device targets so that the code won't break when used in offloading languages.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have somewhat conflicting requirements:

  • On C++ side, writers do not care about offloading (and we can't force them to). They only have __has_builtin() and it does what they need -- if the given builtin exists it will be compileable.
  • On offloading side, we want C++ headers to work out of the box for the host side. Ideally with the host and device compilations seeing the same code after preprocessing, and that's where we get into this problem. We can't tell whether the original C++ code needs __has_builtin() (works well enough for most uses inside of host function bodies) or if it needs __has_target_builtin() (e.g. when it's used inside a lambda or constexpr function which is implicitly HD, and we do need to generate code for it).

I'm not sure we can find a universal solution. That said, __has_target_builtin() gives us some flexibility on the offloading side. C++ side should stick with __has_builtin(). __has_target_builtin() should only be used when offloading comes into the picture, but it includes the possibility that it will be used in the headers shared with C++ and therefore the builtin itself should be available there.

Comment on lines 107 to 112
This function-like macro takes a single identifier argument that is the name of
a builtin function, a builtin pseudo-function (taking one or more type
arguments), or a builtin template.
It evaluates to 1 if the builtin is supported on the current target or 0 if not.
The behavior is different than ``__has_builtin`` when there is an auxiliary target,
such when offloading to a target device.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rephrase it to be more specific in terms of what the difference is rather than when it occurs.

__has_builtin() and __has_target_builtin() behave identically for normal C++ compilations.
For heterogeneous compilations that see source code intended for more than one target

  • __has_builtin() returns true if the builtin is known to the compiler (i.e. it's available via one of the targets), but makes no promises whether it's available on the current target. We can parse it, but not necessarily codegen it.
  • __has_target_builtin() returns true if the builtin can actually be codegen'ed for the current target.

__has_target_builtin() is, effectively, functional superset of CUDA's __CUDA_ARCH__ -- it allows distinguishing both host and target architectures. It has to be treated with similar caution so it does not break consistency of the TU source code seen by the compiler across sub-compilations.

Copy link
Member Author

@sarnex sarnex Feb 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, i like the way you worded it so i'll use most of this verbatim

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only some tiny nits from me

return false;
else if (II->getBuiltinID() != 0) {
switch (II->getBuiltinID()) {
auto BuiltinID = II->getBuiltinID();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please spell out the type explicitly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest commit, thx

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have been undone.

// CHECK-NOTOFFLOAD: DOESNT
#ifdef __has_target_builtin
HAS
#if __has_target_builtin(__builtin_ia32_pause)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add test coverage for when the target does have the builtin?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done in latest commit, thx


// CHECK-NOTOFFLOAD: DOESNT
#ifdef __has_target_builtin
HAS
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should probably check for HAS explicitly.

Copy link
Member Author

@sarnex sarnex Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the macro is unconditionally available in the latest commit so i removed the checking for the macro being defined

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGMT with a nit.

Comment on lines 131 to 136
#else // !CUDA
#if __has_builtin(__builtin_trap)
__builtin_trap();
#else
abort();
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#else // !CUDA
#if __has_builtin(__builtin_trap)
__builtin_trap();
#else
abort();
#endif
#else // !CUDA
#if __has_builtin(__builtin_trap)
__builtin_trap();
#else
abort();
#endif

Is this still necessary now that we allow it on all targets?

Copy link
Member Author

@sarnex sarnex Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, its not necessary, but i was thinking we leave it to suggest users to only use this for offloading, even if it does work for normal compilation modes. if you think the explanation of the differences between has_builtin and has_target_builtin is enough for users, ill remove the offload/nooffload check from the example

arguments), or a builtin template.
It evaluates to 1 if the builtin is supported on the current target or 0 if not.

``__has_builtin`` and ``__has_target_builtin`` behave identically for normal C++ compilations.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this means users should use __has_target_builtin in place of __has_builtin so that their code will correctly check for the builtin on host or device?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to be portable with GCC for average users, but it's the recommended solution for code that's intended to be run on the GPU I'd say.

Copy link
Member Author

@sarnex sarnex Feb 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the user's goal is to check that the builtin can be codegen'd on the current target being compiled, then yes they should use __has_target_builtin. if they want to confirm it can be parsed but not necessarily codegen'd, then they should use __has_builtin. if that's unclear let me know and i can try to improve the doc

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the user's goal is to check that the builtin can be codegen'd on the current target being compiled, then yes they should use __has_target_builtin. if they want to confirm it can be parsed but not necessarily codegen'd, then they should use __has_builtin. if that's unclear let me know and i can try to improve the doc

Users don't typically think in terms of "parsed" and "codegenned", but more "works" and "doesn't work", which I think means "codegenned" in general. The line we're drawing here is pretty subtle, so perhaps more real world examples would help; it's hard to understand why a builtin can be parsed but cannot be used.

Copy link
Member Author

@sarnex sarnex Feb 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's a fair point, so it sounds like you'd prefer an example of when codegen doesn't work even though the user is checking with has_builtin, and for that i could show the motiving example for this change which is something like

void foo() {
#if __has_builtin(__builtin_ia32_pause)
__builtin_ia32_pause();
#else
abort();
#endif
}

and if the current target is an offloading target (amdgpu/nvptx/spirv) and the aux target is x86, we will get a error saying it can't be codegen'd

would that kind of example address your concern? thx

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it would, along with the prose explaining why that's not a bug but is actually by design for __has_builtin. I think users would reasonably look at that and say it should not be an error, so it'd be nice to make sure they understand the intent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, sounds good.

@yxsamliu @Artem-B would implementing Aaron's suggestion address your concerns as well?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to including a concrete example with an explanation. The fact that a builtin may be both visible but not usable will be a surprise for a C++ user.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, ill add this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added in the latest commit, please take a look and let me know if it's clear

Icohedron pushed a commit to Icohedron/llvm-project that referenced this pull request Feb 11, 2025
Comment on lines 132 to 134
Compilation of this code results in a compiler error because ``__builtin_ia32_pause`` is known to the compiler because
it is a builtin supported by the host x86-64 compilation so ``__has_builtin`` returns true. However, code cannot
be generated for ``__builtin_ia32_pause`` during the offload AMDGPU compilation as it is not supported on that target.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, this is helping somewhat. But I think a user is still going to ask themselves "why does __has_builtin return true in this case?". The host and the offload are separate compilations, so logically, it stands to reason that __has_builtin would return true for the host and false for the offload. And because of:

__has_builtin and __has_target_builtin behave identically for normal C++ compilations.

They're going to wonder why they wouldn't just replace all uses of __has_builtin with __has_target_builtin, which begs the question of why __has_builtin isn't just being "fixed".

(I'll be honest, I still don't fully understand the answer to that myself.)

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking as requesting changes so we don't accidentally land, because I've got questions about whether we want to go down this route.

  1. GCC has __has_builtin, so how do they handle offloading targets? Do they have the same odd behavior where __has_builtin returns true for builtins it cannot actually emit code for?
  2. Given that __has_target_builtin seems to have the semantics everyone would expect from __has_builtin, do we want to consider deprecating __has_builtin so that downstreams have time to adjust but we eventually end up with a less confusing builtin?

It really seems to me that __has_builtin has a broken design because of compilations where we try to hide a two-pass compilation as though it were one-pass and it seems like we're going to confuse folks with that behavior. If I could wave a magic wand, I would say __has_builtin should behave exactly how __has_target_builtin is behaving in this PR and that a reliance on __has_builtin behaving how it does today is relying on a bug.

Signed-off-by: Sarnie, Nick <[email protected]>
@sarnex
Copy link
Member Author

sarnex commented Jul 28, 2025

This seems to have been undone.

Yeah just noticed the same thing, looks like that feedback was only in this PR and not the original merged one, my bad. Fixed now.

Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please document this and add it to the clang release notes.

@sarnex
Copy link
Member Author

sarnex commented Jul 28, 2025

ah right, sorry. been away from this pr for too long :P

Signed-off-by: Sarnie, Nick <[email protected]>
Copy link
Contributor

@jhuber6 jhuber6 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG, thanks!

@sarnex
Copy link
Member Author

sarnex commented Jul 28, 2025

I'll merge once @AaronBallman gets a chance to take a final look

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 28, 2025

I'll merge once @AaronBallman gets a chance to take a final look

He showed up in the original patch where I revived this and seemed to agree with the consensus we reached.

@sarnex
Copy link
Member Author

sarnex commented Jul 28, 2025

Sounds like you need this patch quickly, so I'll merge in about an hour unless I get new feedback to address.

@jhuber6
Copy link
Contributor

jhuber6 commented Jul 28, 2025

Sounds like you need this patch quickly, so I'll merge in about an hour unless I get new feedback to address.

It's not an urgent need, I was just experimenting with my RPC interface through HIP and was getting annoyed with keeping a workaround in tree. Thanks for picking this up again.

@sarnex
Copy link
Member Author

sarnex commented Jul 28, 2025

Sure, thanks for the reminder/reviews

Copy link
Collaborator

@AaronBallman AaronBallman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM aside from a nit with the release notes.

This feature is enabled by default but can be disabled by compiling with
``-fno-sanitize-annotate-debug-info-traps``.

- The ``__has_builtin`` function now only considers the currently active target when being used with target offloading.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be moved to the potentially breaking changes section, because the behavioral change could catch folks off-guard.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do then merge, thanks!

Signed-off-by: Sarnie, Nick <[email protected]>
@sarnex sarnex merged commit 0efcb83 into llvm:main Jul 28, 2025
10 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Jul 28, 2025

LLVM Buildbot has detected a new failure on builder lldb-arm-ubuntu running on linaro-lldb-arm-ubuntu while building clang at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/18/builds/19671

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/4/12 (3336 of 3345)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/5/12 (3337 of 3345)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/6/12 (3338 of 3345)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/7/12 (3339 of 3345)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/8/12 (3340 of 3345)
PASS: lldb-unit :: ValueObject/./LLDBValueObjectTests/9/12 (3341 of 3345)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/0/2 (3342 of 3345)
PASS: lldb-unit :: tools/lldb-server/tests/./LLDBServerTests/1/2 (3343 of 3345)
PASS: lldb-unit :: Process/gdb-remote/./ProcessGdbRemoteTests/8/35 (3344 of 3345)
TIMEOUT: lldb-api :: tools/lldb-dap/module/TestDAP_module.py (3345 of 3345)
******************** TEST 'lldb-api :: tools/lldb-dap/module/TestDAP_module.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --arch armv8l --build-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-arm-ubuntu/llvm-project/lldb/test/API/tools/lldb-dap/module -p TestDAP_module.py
--
Exit Code: -9
Timeout: Reached timeout of 600 seconds

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 0efcb83626362213fb6cc99c4af42a93e74e6afe)
  clang revision 0efcb83626362213fb6cc99c4af42a93e74e6afe
  llvm revision 0efcb83626362213fb6cc99c4af42a93e74e6afe

--
Command Output (stderr):
--
========= DEBUG ADAPTER PROTOCOL LOGS =========
1753726496.145977974 (stdio) --> {"command":"initialize","type":"request","arguments":{"adapterID":"lldb-native","clientID":"vscode","columnsStartAt1":true,"linesStartAt1":true,"locale":"en-us","pathFormat":"path","supportsRunInTerminalRequest":true,"supportsVariablePaging":true,"supportsVariableType":true,"supportsStartDebuggingRequest":true,"supportsProgressReporting":true,"$__lldb_sourceInitFile":false},"seq":1}
1753726496.150264263 (stdio) <-- {"body":{"$__lldb_version":"lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision 0efcb83626362213fb6cc99c4af42a93e74e6afe)\n  clang revision 0efcb83626362213fb6cc99c4af42a93e74e6afe\n  llvm revision 0efcb83626362213fb6cc99c4af42a93e74e6afe","completionTriggerCharacters":["."," ","\t"],"exceptionBreakpointFilters":[{"description":"C++ Catch","filter":"cpp_catch","label":"C++ Catch","supportsCondition":true},{"description":"C++ Throw","filter":"cpp_throw","label":"C++ Throw","supportsCondition":true},{"description":"Objective-C Catch","filter":"objc_catch","label":"Objective-C Catch","supportsCondition":true},{"description":"Objective-C Throw","filter":"objc_throw","label":"Objective-C Throw","supportsCondition":true}],"supportTerminateDebuggee":true,"supportsBreakpointLocationsRequest":true,"supportsCancelRequest":true,"supportsCompletionsRequest":true,"supportsConditionalBreakpoints":true,"supportsConfigurationDoneRequest":true,"supportsDataBreakpoints":true,"supportsDelayedStackTraceLoading":true,"supportsDisassembleRequest":true,"supportsEvaluateForHovers":true,"supportsExceptionFilterOptions":true,"supportsExceptionInfoRequest":true,"supportsFunctionBreakpoints":true,"supportsHitConditionalBreakpoints":true,"supportsInstructionBreakpoints":true,"supportsLogPoints":true,"supportsModulesRequest":true,"supportsReadMemoryRequest":true,"supportsSetVariable":true,"supportsSteppingGranularity":true,"supportsValueFormattingOptions":true,"supportsWriteMemoryRequest":true},"command":"initialize","request_seq":1,"seq":0,"success":true,"type":"response"}
1753726496.150687933 (stdio) --> {"command":"launch","type":"request","arguments":{"program":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","initCommands":["settings clear --all","settings set symbols.enable-external-lookup false","settings set target.inherit-tcc true","settings set target.disable-aslr false","settings set target.detach-on-error false","settings set target.auto-apply-fixits false","settings set plugin.process.gdb-remote.packet-timeout 60","settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"","settings set use-color false","settings set show-statusline false"],"disableASLR":false,"enableAutoVariableSummaries":false,"enableSyntheticChildDebugging":false,"displayExtendedBacktrace":false},"seq":2}
1753726496.151207209 (stdio) <-- {"body":{"category":"console","output":"Running initCommands:\n"},"event":"output","seq":0,"type":"event"}
1753726496.151261330 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings clear --all\n"},"event":"output","seq":0,"type":"event"}
1753726496.151278496 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set symbols.enable-external-lookup false\n"},"event":"output","seq":0,"type":"event"}
1753726496.151290894 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.inherit-tcc true\n"},"event":"output","seq":0,"type":"event"}
1753726496.151302338 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.disable-aslr false\n"},"event":"output","seq":0,"type":"event"}
1753726496.151315212 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.detach-on-error false\n"},"event":"output","seq":0,"type":"event"}
1753726496.151326656 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set target.auto-apply-fixits false\n"},"event":"output","seq":0,"type":"event"}
1753726496.151338577 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set plugin.process.gdb-remote.packet-timeout 60\n"},"event":"output","seq":0,"type":"event"}
1753726496.151388884 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set symbols.clang-modules-cache-path \"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api\"\n"},"event":"output","seq":0,"type":"event"}
1753726496.151403666 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set use-color false\n"},"event":"output","seq":0,"type":"event"}
1753726496.151416063 (stdio) <-- {"body":{"category":"console","output":"(lldb) settings set show-statusline false\n"},"event":"output","seq":0,"type":"event"}
1753726496.291954994 (stdio) <-- {"command":"launch","request_seq":2,"seq":0,"success":true,"type":"response"}
1753726496.292039871 (stdio) <-- {"event":"initialized","seq":0,"type":"event"}
1753726496.292100191 (stdio) <-- {"body":{"module":{"addressRange":"0xf7b12000","debugInfoSize":"983.3KB","id":"253BA35E-436C-EC85-2949-CBD09E38AFEE-11B460BF","name":"ld-linux-armhf.so.3","path":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolFilePath":"/usr/lib/arm-linux-gnueabihf/ld-linux-armhf.so.3","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":0,"type":"event"}
1753726496.292393446 (stdio) <-- {"body":{"module":{"addressRange":"0x880000","debugInfoSize":"1.1KB","id":"FB8E86FA","name":"a.out","path":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","symbolFilePath":"/home/tcwg-buildbot/worker/lldb-arm-ubuntu/build/lldb-test-build.noindex/tools/lldb-dap/module/TestDAP_module.test_compile_units/a.out","symbolStatus":"Symbols loaded."},"reason":"new"},"event":"module","seq":0,"type":"event"}
1753726496.292542934 (stdio) --> {"command":"setBreakpoints","type":"request","arguments":{"source":{"name":"main.cpp","path":"main.cpp"},"sourceModified":false,"lines":[5],"breakpoints":[{"line":5}]},"seq":3}
1753726496.304580927 (stdio) <-- {"body":{"breakpoints":[{"column":3,"id":1,"instructionReference":"0x89073C","line":5,"source":{"name":"main.cpp","path":"main.cpp"},"verified":true}]},"command":"setBreakpoints","request_seq":3,"seq":0,"success":true,"type":"response"}
1753726496.304912806 (stdio) --> {"command":"configurationDone","type":"request","arguments":{},"seq":4}

@sarnex
Copy link
Member Author

sarnex commented Jul 29, 2025

@boomanaiden154 Just FYI that I relanded this patch which disables a test, you agreed to take a look at it in the previous PR here, thank you!

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Jul 29, 2025
boomanaiden154 added a commit to boomanaiden154/llvm-project that referenced this pull request Jul 29, 2025
It does not look like __cpuidex builtins are being incorrectly included
in compilations for offload targets anymore, so change up the test to
assume that we are defining __cpuidex as static in cpuid.h now that
\llvm#126324 updates the behavior of __has_builtin on offload compilations.

This ensures we are still testing that we are avoiding the conflicts
around offloading that were first pointed out in
https://reviews.llvm.org/D150646.
boomanaiden154 added a commit that referenced this pull request Jul 30, 2025
It does not look like __cpuidex builtins are being incorrectly included
in compilations for offload targets anymore, so change up the test to
assume that we are defining __cpuidex as static in cpuid.h now that
\#126324 updates the behavior of __has_builtin on offload compilations.

This ensures we are still testing that we are avoiding the conflicts
around offloading that were first pointed out in
https://reviews.llvm.org/D150646.
@alexfh
Copy link
Contributor

alexfh commented Aug 7, 2025

Some code from NCCL (https://github.com/NVIDIA/nccl/blob/master/src/graph/xml.cc#L16) started failing to compile after this patch. A standalone test (https://gcc.godbolt.org/z/n1dbEGr1M):

#if defined(__x86_64__)
#include <cpuid.h>
#endif

This started producing the following error:

/opt/compiler-explorer/clang-trunk-20250807/lib/clang/22/include/cpuid.h:348:22: error: static declaration of '__cpuidex' follows non-static declaration
  348 | static __inline void __cpuidex(int __cpu_info[4], int __leaf, int __subleaf) {
      |                      ^
/opt/compiler-explorer/clang-trunk-20250807/lib/clang/22/include/cpuid.h:348:22: note: '__cpuidex' is a builtin with type 'void (int *, int, int) noexcept'

@sarnex
Copy link
Member Author

sarnex commented Aug 7, 2025

That looks like exactly the issue @boomanaiden154 had the test for in #151220. @boomanaiden154 can you take a look? If it's really an issue with this patch and not something exposed by it let me know. Thanks!

@boomanaiden154
Copy link
Contributor

This is the cc1 invocation that is causing things to fail (on the example provided in the comment above):

/home/gha/llvm-project/build/bin/clang-22 -cc1 -triple nvptx64-nvidia-cuda -aux-triple x86_64-unknown-linux-gnu -fsyntax-only -disable-free -clear-ast-before-backend -main-file-name test.c -mrelocation-model static -mframe-pointer=all -fno-rounding-math -no-integrated-as -aux-target-cpu x86-64 -fcuda-is-device -mllvm -enable-memcpyopt-without-libcalls -fno-threadsafe-statics -target-cpu sm_52 -target-feature +ptx42 -debugger-tuning=gdb -fno-dwarf-directory-asm -fdebug-compilation-dir=/home/gha/llvm-project/build -resource-dir /home/gha/llvm-project/build/lib/clang/22 -internal-isystem /home/gha/llvm-project/build/lib/clang/22/include/cuda_wrappers -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/backward -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/x86_64-linux-gnu/c++/13 -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../include/c++/13/backward -internal-isystem /home/gha/llvm-project/build/lib/clang/22/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -internal-isystem /home/gha/llvm-project/build/lib/clang/22/include -internal-isystem /usr/local/include -internal-isystem /usr/lib/gcc/x86_64-linux-gnu/13/../../../../x86_64-linux-gnu/include -internal-externc-isystem /usr/include/x86_64-linux-gnu -internal-externc-isystem /include -internal-externc-isystem /usr/include -fdeprecated-macro -fno-autolink -ferror-limit 19 -fmessage-length=451 --offload-new-driver --no-offloadlib -fgnuc-version=4.2.1 -fskip-odr-check-in-gmf -fcxx-exceptions -fexceptions -fcolor-diagnostics -cuid=bc129a722f93e87b -D__GCC_HAVE_DWARF2_CFI_ASM=1 -x cuda /tmp/test.c

It looks like something in the CUDA flags enables the __cpuidex builtin, and then we aren't detecting it with __has_builtin because the host is the aux triple. It's the same flow as the offloading test in __cpuidex_conflict.c, but it seems like CUDA is handling things differently. I need to spend a bit more time investigating.

boomanaiden154 added a commit to boomanaiden154/llvm-project that referenced this pull request Aug 7, 2025
The landing of llvm#126324 made it so that __has_builtin returns false for
aux triple builtins. CUDA offloading can sometimes compile where the
host is in the aux triple (ie x86_64). This patch explicitly carves out
NVPTX so that we do not run into redefinition errors.
boomanaiden154 added a commit that referenced this pull request Aug 7, 2025
The landing of #126324 made it so that __has_builtin returns false for
aux triple builtins. CUDA offloading can sometimes compile where the
host is in the aux triple (ie x86_64). This patch explicitly carves out
NVPTX so that we do not run into redefinition errors.
@sarnex
Copy link
Member Author

sarnex commented Aug 8, 2025

@alexfh Aiden landed a patch that should fix this, if you're still seeing it please let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.