[Clang][HIP] Target-dependent overload resolution in declarators and specifiers #103031

ritter-x2a · 2024-08-13T09:41:15Z

So far, the resolution of host/device overloads for functions in HIP/CUDA operates as if in a host-device context for code outside of function bodies, e.g., in expressions that are part of template arguments in top-level declarations. This means that, if separate host and device overloads are declared, the device overload is used in the device compilation phase and the host overload is used in the host compilation phase.

This patch changes overload resolution in such cases to prefer overloads that match the target of the declaration in which they occur. For example:

__device__ constexpr int get_n() { return 64; }
__host__ constexpr int get_n() { return -1; }
__device__ std::enable_if<(get_n() > 32)>::type foo() { }

Before, this code would not compile, because get_n resolved to the host overload during host compilation, causing an error. With this patch, the call to get_n in the declaration of the device function foo resolves to the device overload in host and device compilation.

If attributes that affect the declaration's target occur after a call with target-dependent overload resolution, a warning is issued. This is realized by registering the Kinds of relevant attributes in the CUDATargetContext when they are parsed.

This is an alternative to PR #93546, which is required for PR #91478.

…specifiers So far, the resolution of host/device overloads for functions in HIP/CUDA operates as if in a host-device context for code outside of function bodies, e.g., in expressions that are part of template arguments in top-level declarations. This means that, if separate host and device overloads are declared, the device overload is used in the device compilation phase and the host overload is used in the host compilation phase. This patch changes overload resolution in such cases to prefer overloads that match the target of the declaration in which they occur. For example: __device__ constexpr int get_n() { return 64; } __host__ constexpr int get_n() { return -1; } __device__ std::enable_if<(get_n() > 32)>::type foo() { } Before, this code would not compile, because get_n resolved to the host overload during host compilation, causing an error. With this patch, the call to get_n in the declaration of the device function foo resolves to the device overload in host and device compilation. If attributes that affect the declaration's target occur after a call with target-dependent overload resolution, a warning is issued. This is realized by registering the Kinds of relevant attributes in the CUDATargetContext when they are parsed. This is an alternative to PR llvm#93546, which is required for PR llvm#91478.

llvmbot · 2024-08-13T09:41:51Z

@llvm/pr-subscribers-clang

Author: Fabian Ritter (ritter-x2a)

Changes

So far, the resolution of host/device overloads for functions in HIP/CUDA operates as if in a host-device context for code outside of function bodies, e.g., in expressions that are part of template arguments in top-level declarations. This means that, if separate host and device overloads are declared, the device overload is used in the device compilation phase and the host overload is used in the host compilation phase.

This patch changes overload resolution in such cases to prefer overloads that match the target of the declaration in which they occur. For example:

__device__ constexpr int get_n() { return 64; }
__host__ constexpr int get_n() { return -1; }
__device__ std::enable_if&lt;(get_n() &gt; 32)&gt;::type foo() { }

Before, this code would not compile, because get_n resolved to the host overload during host compilation, causing an error. With this patch, the call to get_n in the declaration of the device function foo resolves to the device overload in host and device compilation.

If attributes that affect the declaration's target occur after a call with target-dependent overload resolution, a warning is issued. This is realized by registering the Kinds of relevant attributes in the CUDATargetContext when they are parsed.

This is an alternative to PR #93546, which is required for PR #91478.

Patch is 47.59 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/103031.diff

9 Files Affected:

(modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+4)
(modified) clang/include/clang/Sema/SemaCUDA.h (+28-4)
(modified) clang/lib/Parse/ParseDecl.cpp (+6)
(modified) clang/lib/Parse/ParseDeclCXX.cpp (+6)
(modified) clang/lib/Parse/Parser.cpp (+8)
(modified) clang/lib/Sema/SemaCUDA.cpp (+97-16)
(modified) clang/lib/Sema/SemaOverload.cpp (+1-1)
(added) clang/test/SemaCUDA/target-overloads-availability-warnings.cu (+148)
(added) clang/test/SemaCUDA/target-overloads-in-function-prototypes.cu (+690)

diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index 554dbaff2ce0d8..8709f60678b466 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9017,6 +9017,10 @@ def err_global_call_not_config : Error<
 def err_ref_bad_target : Error<
   "reference to %select{__device__|__global__|__host__|__host__ __device__}0 "
   "%select{function|variable}1 %2 in %select{__device__|__global__|__host__|__host__ __device__}3 function">;
+def warn_target_specfier_ignored : Warning<
+  "target specifier has been ignored for overload resolution; "
+  "move the target specifier to the beginning of the declaration to use it for overload resolution">,
+  InGroup<IgnoredAttributes>;
 def note_cuda_const_var_unpromoted : Note<
   "const variable cannot be emitted on device side due to dynamic initialization">;
 def note_cuda_host_var : Note<
diff --git a/clang/include/clang/Sema/SemaCUDA.h b/clang/include/clang/Sema/SemaCUDA.h
index 63dc3f4da240b3..83083ada889a16 100644
--- a/clang/include/clang/Sema/SemaCUDA.h
+++ b/clang/include/clang/Sema/SemaCUDA.h
@@ -104,6 +104,8 @@ class SemaCUDA : public SemaBase {
   CUDAFunctionTarget IdentifyTarget(const FunctionDecl *D,
                                     bool IgnoreImplicitHDAttr = false);
   CUDAFunctionTarget IdentifyTarget(const ParsedAttributesView &Attrs);
+  CUDAFunctionTarget IdentifyTarget(
+      const SmallVectorImpl<clang::AttributeCommonInfo::Kind> &AttrKinds);
 
   enum CUDAVariableTarget {
     CVT_Device,  /// Emitted on device side with a shadow variable on host side
@@ -120,21 +122,43 @@ class SemaCUDA : public SemaBase {
     CTCK_Unknown,       /// Unknown context
     CTCK_InitGlobalVar, /// Function called during global variable
                         /// initialization
+    CTCK_Declaration,   /// Function called in a declaration specifier or
+                        /// declarator outside of other contexts, usually in
+                        /// template arguments.
   };
 
   /// Define the current global CUDA host/device context where a function may be
   /// called. Only used when a function is called outside of any functions.
-  struct CUDATargetContext {
-    CUDAFunctionTarget Target = CUDAFunctionTarget::HostDevice;
+  class CUDATargetContext {
+  public:
     CUDATargetContextKind Kind = CTCK_Unknown;
-    Decl *D = nullptr;
+
+    CUDATargetContext() = default;
+
+    CUDATargetContext(SemaCUDA *S, CUDATargetContextKind Kind,
+                      CUDAFunctionTarget Target);
+
+    CUDAFunctionTarget getTarget();
+
+    /// If this is a CTCK_Declaration context, update the Target based on Attrs.
+    /// No-op otherwise.
+    /// Issues a diagnostic if the target changes after it has been queried
+    /// before.
+    void tryRegisterTargetAttrs(const ParsedAttributesView &Attrs);
+
+  private:
+    SemaCUDA *S = nullptr;
+    CUDAFunctionTarget Target = CUDAFunctionTarget::HostDevice;
+    SmallVector<clang::AttributeCommonInfo::Kind, 0> AttrKinds;
+    bool TargetQueried = false;
+
   } CurCUDATargetCtx;
 
   struct CUDATargetContextRAII {
     SemaCUDA &S;
     SemaCUDA::CUDATargetContext SavedCtx;
     CUDATargetContextRAII(SemaCUDA &S_, SemaCUDA::CUDATargetContextKind K,
-                          Decl *D);
+                          Decl *D = nullptr);
     ~CUDATargetContextRAII() { S.CurCUDATargetCtx = SavedCtx; }
   };
 
diff --git a/clang/lib/Parse/ParseDecl.cpp b/clang/lib/Parse/ParseDecl.cpp
index a8a9d3f3f5b088..615aa8e4c5df02 100644
--- a/clang/lib/Parse/ParseDecl.cpp
+++ b/clang/lib/Parse/ParseDecl.cpp
@@ -311,6 +311,9 @@ void Parser::ParseGNUAttributes(ParsedAttributes &Attrs,
   }
 
   Attrs.Range = SourceRange(StartLoc, EndLoc);
+
+  if (Actions.getLangOpts().CUDA)
+    Actions.CUDA().CurCUDATargetCtx.tryRegisterTargetAttrs(Attrs);
 }
 
 /// Determine whether the given attribute has an identifier argument.
@@ -1003,6 +1006,9 @@ void Parser::ParseMicrosoftDeclSpecs(ParsedAttributes &Attrs) {
   }
 
   Attrs.Range = SourceRange(StartLoc, EndLoc);
+
+  if (Actions.getLangOpts().CUDA)
+    Actions.CUDA().CurCUDATargetCtx.tryRegisterTargetAttrs(Attrs);
 }
 
 void Parser::ParseMicrosoftTypeAttributes(ParsedAttributes &attrs) {
diff --git a/clang/lib/Parse/ParseDeclCXX.cpp b/clang/lib/Parse/ParseDeclCXX.cpp
index aac89d910bbc83..00010731043330 100644
--- a/clang/lib/Parse/ParseDeclCXX.cpp
+++ b/clang/lib/Parse/ParseDeclCXX.cpp
@@ -27,6 +27,7 @@
 #include "clang/Sema/EnterExpressionEvaluationContext.h"
 #include "clang/Sema/ParsedTemplate.h"
 #include "clang/Sema/Scope.h"
+#include "clang/Sema/SemaCUDA.h"
 #include "clang/Sema/SemaCodeCompletion.h"
 #include "llvm/ADT/SmallString.h"
 #include "llvm/Support/TimeProfiler.h"
@@ -2852,6 +2853,11 @@ Parser::DeclGroupPtrTy Parser::ParseCXXClassMemberDeclaration(
     ParsedTemplateInfo &TemplateInfo, ParsingDeclRAIIObject *TemplateDiags) {
   assert(getLangOpts().CPlusPlus &&
          "ParseCXXClassMemberDeclaration should only be called in C++ mode");
+  SemaCUDA::CUDATargetContextRAII CTCRAII(Actions.CUDA(),
+                                          SemaCUDA::CTCK_Declaration);
+  if (Actions.getLangOpts().CUDA)
+    Actions.CUDA().CurCUDATargetCtx.tryRegisterTargetAttrs(AccessAttrs);
+
   if (Tok.is(tok::at)) {
     if (getLangOpts().ObjC && NextToken().isObjCAtKeyword(tok::objc_defs))
       Diag(Tok, diag::err_at_defs_cxx);
diff --git a/clang/lib/Parse/Parser.cpp b/clang/lib/Parse/Parser.cpp
index 04c2f1d380bc48..b7bc11964e9687 100644
--- a/clang/lib/Parse/Parser.cpp
+++ b/clang/lib/Parse/Parser.cpp
@@ -21,6 +21,7 @@
 #include "clang/Sema/DeclSpec.h"
 #include "clang/Sema/ParsedTemplate.h"
 #include "clang/Sema/Scope.h"
+#include "clang/Sema/SemaCUDA.h"
 #include "clang/Sema/SemaCodeCompletion.h"
 #include "llvm/Support/Path.h"
 #include "llvm/Support/TimeProfiler.h"
@@ -1133,6 +1134,13 @@ bool Parser::isStartOfFunctionDefinition(const ParsingDeclarator &Declarator) {
 Parser::DeclGroupPtrTy Parser::ParseDeclOrFunctionDefInternal(
     ParsedAttributes &Attrs, ParsedAttributes &DeclSpecAttrs,
     ParsingDeclSpec &DS, AccessSpecifier AS) {
+  SemaCUDA::CUDATargetContextRAII CTCRAII(Actions.CUDA(),
+                                          SemaCUDA::CTCK_Declaration);
+  if (Actions.getLangOpts().CUDA) {
+    Actions.CUDA().CurCUDATargetCtx.tryRegisterTargetAttrs(Attrs);
+    Actions.CUDA().CurCUDATargetCtx.tryRegisterTargetAttrs(DeclSpecAttrs);
+  }
+
   // Because we assume that the DeclSpec has not yet been initialised, we simply
   // overwrite the source range and attribute the provided leading declspec
   // attributes.
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index ec37c0df56c671..b16c50a95ccba2 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -18,6 +18,7 @@
 #include "clang/Basic/TargetInfo.h"
 #include "clang/Lex/Preprocessor.h"
 #include "clang/Sema/Lookup.h"
+#include "clang/Sema/ParsedAttr.h"
 #include "clang/Sema/ScopeInfo.h"
 #include "clang/Sema/Sema.h"
 #include "clang/Sema/SemaDiagnostic.h"
@@ -68,13 +69,28 @@ ExprResult SemaCUDA::ActOnExecConfigExpr(Scope *S, SourceLocation LLLLoc,
                                /*IsExecConfig=*/true);
 }
 
-CUDAFunctionTarget SemaCUDA::IdentifyTarget(const ParsedAttributesView &Attrs) {
+namespace {
+
+// This iterator adaptor enables sharing a IdentifyTarget implementation for
+// ParsedAttributesView and for vectors of AttributeCommonInfo::Kind.
+struct AttrKindIterator
+    : llvm::iterator_adaptor_base<
+          AttrKindIterator, ParsedAttributesView::const_iterator,
+          std::random_access_iterator_tag, clang::AttributeCommonInfo::Kind> {
+  AttrKindIterator() : iterator_adaptor_base(nullptr) {}
+  AttrKindIterator(ParsedAttributesView::const_iterator I)
+      : iterator_adaptor_base(I) {}
+  clang::AttributeCommonInfo::Kind operator*() const { return I->getKind(); }
+};
+
+template <typename AKIterRange>
+CUDAFunctionTarget IdentifyTargetImpl(const AKIterRange &AttrKinds) {
   bool HasHostAttr = false;
   bool HasDeviceAttr = false;
   bool HasGlobalAttr = false;
   bool HasInvalidTargetAttr = false;
-  for (const ParsedAttr &AL : Attrs) {
-    switch (AL.getKind()) {
+  for (const auto &AK : AttrKinds) {
+    switch (AK) {
     case ParsedAttr::AT_CUDAGlobal:
       HasGlobalAttr = true;
       break;
@@ -107,6 +123,18 @@ CUDAFunctionTarget SemaCUDA::IdentifyTarget(const ParsedAttributesView &Attrs) {
   return CUDAFunctionTarget::Host;
 }
 
+} // namespace
+
+CUDAFunctionTarget SemaCUDA::IdentifyTarget(const ParsedAttributesView &Attrs) {
+  return IdentifyTargetImpl(make_range(AttrKindIterator(Attrs.begin()),
+                                       AttrKindIterator(Attrs.end())));
+}
+
+CUDAFunctionTarget SemaCUDA::IdentifyTarget(
+    const SmallVectorImpl<clang::AttributeCommonInfo::Kind> &AttrKinds) {
+  return IdentifyTargetImpl(AttrKinds);
+}
+
 template <typename A>
 static bool hasAttr(const Decl *D, bool IgnoreImplicitAttr) {
   return D->hasAttrs() && llvm::any_of(D->getAttrs(), [&](Attr *Attribute) {
@@ -115,20 +143,65 @@ static bool hasAttr(const Decl *D, bool IgnoreImplicitAttr) {
          });
 }
 
+SemaCUDA::CUDATargetContext::CUDATargetContext(SemaCUDA *S,
+                                               CUDATargetContextKind Kind,
+                                               CUDAFunctionTarget Target)
+    : Kind(Kind), S(S), Target(Target) {}
+
+CUDAFunctionTarget SemaCUDA::CUDATargetContext::getTarget() {
+  TargetQueried = true;
+  return Target;
+}
+
+void SemaCUDA::CUDATargetContext::tryRegisterTargetAttrs(
+    const ParsedAttributesView &Attrs) {
+  if (Kind != CTCK_Declaration)
+    return;
+  for (const auto &A : Attrs) {
+    auto AK = A.getKind();
+    switch (AK) {
+    case ParsedAttr::AT_CUDAGlobal:
+    case ParsedAttr::AT_CUDAHost:
+    case ParsedAttr::AT_CUDADevice:
+    case ParsedAttr::AT_CUDAInvalidTarget:
+      break;
+    default:
+      continue;
+    }
+    AttrKinds.push_back(AK);
+    CUDAFunctionTarget NewTarget = S->IdentifyTarget(AttrKinds);
+    if (TargetQueried && (NewTarget != Target))
+      S->Diag(A.getLoc(), diag::warn_target_specfier_ignored);
+    Target = NewTarget;
+  }
+}
+
 SemaCUDA::CUDATargetContextRAII::CUDATargetContextRAII(
     SemaCUDA &S_, SemaCUDA::CUDATargetContextKind K, Decl *D)
     : S(S_) {
   SavedCtx = S.CurCUDATargetCtx;
-  assert(K == SemaCUDA::CTCK_InitGlobalVar);
-  auto *VD = dyn_cast_or_null<VarDecl>(D);
-  if (VD && VD->hasGlobalStorage() && !VD->isStaticLocal()) {
-    auto Target = CUDAFunctionTarget::Host;
-    if ((hasAttr<CUDADeviceAttr>(VD, /*IgnoreImplicit=*/true) &&
-         !hasAttr<CUDAHostAttr>(VD, /*IgnoreImplicit=*/true)) ||
-        hasAttr<CUDASharedAttr>(VD, /*IgnoreImplicit=*/true) ||
-        hasAttr<CUDAConstantAttr>(VD, /*IgnoreImplicit=*/true))
-      Target = CUDAFunctionTarget::Device;
-    S.CurCUDATargetCtx = {Target, K, VD};
+
+  switch (K) {
+  case SemaCUDA::CTCK_InitGlobalVar: {
+    auto *VD = dyn_cast_or_null<VarDecl>(D);
+    if (VD && VD->hasGlobalStorage() && !VD->isStaticLocal()) {
+      auto Target = CUDAFunctionTarget::Host;
+      if ((hasAttr<CUDADeviceAttr>(VD, /*IgnoreImplicit=*/true) &&
+           !hasAttr<CUDAHostAttr>(VD, /*IgnoreImplicit=*/true)) ||
+          hasAttr<CUDASharedAttr>(VD, /*IgnoreImplicit=*/true) ||
+          hasAttr<CUDAConstantAttr>(VD, /*IgnoreImplicit=*/true))
+        Target = CUDAFunctionTarget::Device;
+      S.CurCUDATargetCtx = CUDATargetContext(&S, K, Target);
+    }
+    break;
+  }
+  case SemaCUDA::CTCK_Declaration:
+    // The target is updated once relevant attributes are parsed. Initialize
+    // with the target used if no attributes are present: Host.
+    S.CurCUDATargetCtx = CUDATargetContext(&S, K, CUDAFunctionTarget::Host);
+    break;
+  default:
+    llvm_unreachable("unexpected context kind");
   }
 }
 
@@ -137,7 +210,7 @@ CUDAFunctionTarget SemaCUDA::IdentifyTarget(const FunctionDecl *D,
                                             bool IgnoreImplicitHDAttr) {
   // Code that lives outside a function gets the target from CurCUDATargetCtx.
   if (D == nullptr)
-    return CurCUDATargetCtx.Target;
+    return CurCUDATargetCtx.getTarget();
 
   if (D->hasAttr<CUDAInvalidTargetAttr>())
     return CUDAFunctionTarget::InvalidTarget;
@@ -232,7 +305,7 @@ SemaCUDA::IdentifyPreference(const FunctionDecl *Caller,
   // trivial ctor/dtor without device attr to be used. Non-trivial ctor/dtor
   // will be diagnosed by checkAllowedInitializer.
   if (Caller == nullptr && CurCUDATargetCtx.Kind == CTCK_InitGlobalVar &&
-      CurCUDATargetCtx.Target == CUDAFunctionTarget::Device &&
+      CurCUDATargetCtx.getTarget() == CUDAFunctionTarget::Device &&
       (isa<CXXConstructorDecl>(Callee) || isa<CXXDestructorDecl>(Callee)))
     return CFP_HostDevice;
 
@@ -297,8 +370,16 @@ SemaCUDA::IdentifyPreference(const FunctionDecl *Caller,
       (CallerTarget == CUDAFunctionTarget::Device &&
        CalleeTarget == CUDAFunctionTarget::Host) ||
       (CallerTarget == CUDAFunctionTarget::Global &&
-       CalleeTarget == CUDAFunctionTarget::Host))
+       CalleeTarget == CUDAFunctionTarget::Host)) {
+    // In declaration contexts outside of function bodies and variable
+    // initializers, tolerate mismatched function targets as long as they are
+    // not codegened.
+    if (CurCUDATargetCtx.Kind == CTCK_Declaration &&
+        !this->SemaRef.getCurFunctionDecl(/*AllowLambda=*/true))
+      return CFP_WrongSide;
+
     return CFP_Never;
+  }
 
   llvm_unreachable("All cases should've been handled by now.");
 }
diff --git a/clang/lib/Sema/SemaOverload.cpp b/clang/lib/Sema/SemaOverload.cpp
index 52f640eb96b73b..e3703c2c735fe1 100644
--- a/clang/lib/Sema/SemaOverload.cpp
+++ b/clang/lib/Sema/SemaOverload.cpp
@@ -10747,7 +10747,7 @@ OverloadCandidateSet::BestViableFunction(Sema &S, SourceLocation Loc,
         llvm::any_of(Candidates, [&](OverloadCandidate *Cand) {
           // Check viable function only.
           return Cand->Viable && Cand->Function &&
-                 S.CUDA().IdentifyPreference(Caller, Cand->Function) ==
+                 S.CUDA().IdentifyPreference(Caller, Cand->Function) >=
                      SemaCUDA::CFP_SameSide;
         });
     if (ContainsSameSideCandidate) {
diff --git a/clang/test/SemaCUDA/target-overloads-availability-warnings.cu b/clang/test/SemaCUDA/target-overloads-availability-warnings.cu
new file mode 100644
index 00000000000000..f0fc1bea1db642
--- /dev/null
+++ b/clang/test/SemaCUDA/target-overloads-availability-warnings.cu
@@ -0,0 +1,148 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify=expected,onhost %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify=expected,ondevice %s
+
+template <bool C, class T = void> struct my_enable_if {};
+
+template <class T> struct my_enable_if<true, T> {
+  typedef T type;
+};
+
+__attribute__((host, device)) void use(int x);
+
+// For 'OverloadFunHostDepr', the host overload is deprecated, the device overload is not.
+__attribute__((device)) constexpr int OverloadFunHostDepr(void) { return 1; }
+__attribute__((host, deprecated("Host variant"))) constexpr int OverloadFunHostDepr(void) { return 1; } // expected-note 0+ {{has been explicitly marked deprecated here}}
+
+
+// For 'OverloadFunDeviceDepr', the device overload is deprecated, the host overload is not.
+__attribute__((device, deprecated("Device variant"))) constexpr int OverloadFunDeviceDepr(void) { return 1; } // expected-note 0+ {{has been explicitly marked deprecated here}}
+__attribute__((host)) constexpr int OverloadFunDeviceDepr(void) { return 1; }
+
+
+// For 'TemplateOverloadFun', the host overload is deprecated, the device overload is not.
+template<typename T>
+__attribute__((device)) constexpr T TemplateOverloadFun(void) { return 1; }
+
+template<typename T>
+__attribute__((host, deprecated("Host variant"))) constexpr T TemplateOverloadFun(void) { return 1; } // expected-note 0+ {{has been explicitly marked deprecated here}}
+
+
+// There is only a device overload, and it is deprecated.
+__attribute__((device, deprecated)) constexpr int // expected-note 0+ {{has been explicitly marked deprecated here}}
+DeviceOnlyFunDeprecated(void) { return 1; }
+
+// There is only a host overload, and it is deprecated.
+__attribute__((host, deprecated)) constexpr int // expected-note 0+ {{has been explicitly marked deprecated here}}
+HostOnlyFunDeprecated(void) { return 1; }
+
+class FunSelector {
+public:
+  // This should use the non-deprecated device overload.
+  template<int X> __attribute__((device))
+  auto devicefun(void) -> typename my_enable_if<(X == OverloadFunHostDepr()), int>::type {
+    return 1;
+  }
+
+  // This should use the non-deprecated device overload.
+  template<int X> __attribute__((device))
+  auto devicefun(void) -> typename my_enable_if<(X != OverloadFunHostDepr()), int>::type {
+      return 0;
+  }
+
+  // This should use the deprecated device overload.
+  template<int X> __attribute__((device))
+  auto devicefun_wrong(void) -> typename my_enable_if<(X == OverloadFunDeviceDepr()), int>::type { // expected-warning {{'OverloadFunDeviceDepr' is deprecated: Device variant}}
+    return 1;
+  }
+
+  // This should use the deprecated device overload.
+  template<int X> __attribute__((device))
+  auto devicefun_wrong(void) -> typename my_enable_if<(X != OverloadFunDeviceDepr()), int>::type { // expected-warning {{'OverloadFunDeviceDepr' is deprecated: Device variant}}
+      return 0;
+  }
+
+  // This should use the non-deprecated host overload.
+  template<int X> __attribute__((host))
+  auto hostfun(void) -> typename my_enable_if<(X == OverloadFunDeviceDepr()), int>::type {
+    return 1;
+  }
+
+  // This should use the non-deprecated host overload.
+  template<int X> __attribute__((host))
+  auto hostfun(void) -> typename my_enable_if<(X != OverloadFunDeviceDepr()), int>::type {
+      return 0;
+  }
+
+  // This should use the deprecated host overload.
+  template<int X> __attribute__((host))
+  auto hostfun_wrong(void) -> typename my_enable_if<(X == OverloadFunHostDepr()), int>::type { // expected-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+    return 1;
+  }
+
+  // This should use the deprecated host overload.
+  template<int X> __attribute__((host))
+  auto hostfun_wrong(void) -> typename my_enable_if<(X != OverloadFunHostDepr()), int>::type { // expected-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+      return 0;
+  }
+};
+
+
+// These should not be diagnosed since the device overload of
+// OverloadFunHostDepr is not deprecated:
+__attribute__((device)) my_enable_if<(OverloadFunHostDepr() > 0), int>::type
+DeviceUserOverloadFunHostDepr1(void) { return 2; }
+
+__attribute__((device)) my_enable_if<(OverloadFunHostDepr() > 0), int>::type constexpr
+DeviceUserOverloadFunHostDeprConstexpr(void) { return 2; }
+
+
+// Analogously for OverloadFunDeviceDepr:
+__attribute__((host)) my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type
+DeviceUserOverloadFunDeviceDepr1(void) { return 2; }
+
+my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type __attribute__((host))
+DeviceUserOverloadFunDeviceDepr2(void) { return 2; }
+
+__attribute__((host)) my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type constexpr
+DeviceUserOverloadFunDeviceDeprConstexpr(void) { return 2; }
+
+
+// Actual uses of the deprecated overloads should be diagnosed:
+__attribute__((host, device)) my_enable_if<(OverloadFunHostDepr() > 0), int>::type // onhost-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+HostDeviceUserOverloadFunHostDepr(void) { return 3; }
+
+__attribute__((host)) my_enable_if<(OverloadFunHostDepr() > 0), int>::type constexpr // expected-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+HostUserOverloadFunHostDeprConstexpr(void) { return 3; }
+
+__attribute__((device)) my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type constexpr // expected-warning {{'OverloadFunDeviceDepr' is deprecated: Device variant}}
+HostUserOverloadFunDeviceDeprConstexpr(void) { return 3; }
+
+
+// Making the offending decl a template shouldn't change anything:
+__attribute__((host)) my_enable_if<(TemplateOverloadFun<int>() > 0), int>::type // expected-warning {{'TemplateOverloadFun<int>' is deprecated: Host...
[truncated]

…rs and specifiers Handle and test template functions outside of classes.

…clarators and specifiers Add a test to document the behavior for default arguments of template parameters.

Artem-B

Considering that we're adding another interesting quirk to how we interpret target attributes & function calls, it would be useful to run this by a language lawyer to make sure we're not missing something.

@zygoloid - would you have time to take a look or suggest who else may be the right person to consult on this kind of changes?

Artem-B · 2024-08-13T21:09:05Z

clang/include/clang/Basic/DiagnosticSemaKinds.td

  "reference to %select{__device__|__global__|__host__|__host__ __device__}0 "
  "%select{function|variable}1 %2 in %select{__device__|__global__|__host__|__host__ __device__}3 function">;
+def warn_target_specfier_ignored : Warning<
+  "target specifier has been ignored for overload resolution; "


Target attribute?

Addressed in e273a99, thank you for the feedback!

Artem-B · 2024-08-19T17:52:34Z

clang/lib/Sema/SemaCUDA.cpp

         });
 }

+SemaCUDA::CUDATargetContext::CUDATargetContext(SemaCUDA *S,


This could probably be moved into the header.

Addressed in e273a99.

…n in declarators and specifiers Rename 'target specifier' -> 'target attribute', move CUDATargetContext constructor in header.

ritter-x2a · 2024-08-20T06:56:02Z

clang/test/SemaCUDA/target-overloads-in-function-prototypes.cu

+// Default arguments for template parameters occur before the target attribute,
+// so we can't identify the "right" overload for them.
+template <typename T = targetdep_t>
+__attribute__((device)) // expected-warning {{target attribute has been ignored for overload resolution}}


A point worth noting is this case, where moving the target attribute before the overloaded call (here wrapped in targetdep_t) is not possible: in default arguments for template parameters. The rocPRIM library contains this kind of code (and therefore also the introduced warning). I'd be happy about suggestions to improve this.

We could mitigate this issue at least for non-type template parameters with default arguments by supporting this syntax to control overload resolution:

template <__attribute__((device)) int V = device_specific_init()> __attribute__((device)) void foo(void) { // ... }

Somewhat surprisingly (to me), trunk already accepts device attributes in non-type template arguments without diagnostic (but seems to ignore them), so this would not introduce new syntax.
So far, I have not encountered type template parameters with target-dependent default arguments -- where this syntax is not possible -- outside of this test. The problematic cases in rocPRIM are non-type template parameters (e.g. the device function device_warp_size in the default argument here).

@yxsamliu, do you think that this is a viable way to avoid the warnings, e.g., in rocPRIM?

We could mitigate this issue at least for non-type template parameters with default arguments by supporting this syntax to control overload resolution:

template <__attribute__((device)) int V = device_specific_init()> __attribute__((device)) void foo(void) { // ... }

Somewhat surprisingly (to me), trunk already accepts device attributes in non-type template arguments without diagnostic (but seems to ignore them), so this would not introduce new syntax. So far, I have not encountered type template parameters with target-dependent default arguments -- where this syntax is not possible -- outside of this test. The problematic cases in rocPRIM are non-type template parameters (e.g. the device function device_warp_size in the default argument here).

@yxsamliu, do you think that this is a viable way to avoid the warnings, e.g., in rocPRIM?

Sorry for the delay. Do we have another choice? e.g. deferring parsing of default template arguments, or looking ahead for template function host/device attributes before parsing the default template arguments? Because we want a consistent way of resolving function calls in template arguments for template definition, specialization, and instantiation.

Also, does this change pass internal PSDB?

Thanks for the reply!

e.g. deferring parsing of default template arguments, or looking ahead for template function host/device attributes before parsing the default template arguments?

I don't think that's possible. We would need to parse what comes after the default template arguments before resolving overloads in the default template arguments, but overload resolution can affect the parse tree of the default template arguments (see, e.g., this example on godbolt.org).
Looking through the code base, I found only one instance of deferred parsing, but as far as I can tell it only defers parsing of an initializer past the handling of its declarator, and does not switch the order in which things are parsed.

Also, does this change pass internal PSDB?

The current state of the PR has passed PSDB without failures, but there are warnings that a target attribute has been ignored (the warning that this PR introduces) in rocPRIM, because of cases like this default template argument.
The result in the rocPRIM case should still be correct because the called constexpr function has only a device overload, so no host overload could be chosen instead.

As a side note: When experimenting with the above template-parameter-attribute syntax, I noticed that, in the current state of the PR, such an attribute in a template parameter (or any other sub-declaration) would affect the overload resolution of calls in the remaining declaration, outside of the template parameter where it occurs. That's probably not what we want.
I haven't pushed a fix for that yet because the fix relates to whether we decide to use the above template-parameter-attribute syntax.

@yxsamliu, I opened #109663 with an alternative solution for the root problem that I'm actually trying to solve (diagnosing wrong uses of __AMDGCN_WAVEFRONT_SIZE), which would make PRs #91478, #93546, and #103031 (this one) unnecessary.
Please let me know if you think we should rather pursue #109663.

ritter-x2a · 2024-08-26T07:14:12Z

Ping.

ritter-x2a · 2024-09-04T06:57:21Z

Ping.

…ithout relying on target-dependent overload resolution The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can only provide meaningful values during device compilation. They are currently usable in host code, but only contain the default value of 64, independent of the target device(s). This patch checks for numeric literals in clearly identifiable host code if they are the result of expanding the wavefront-size macros and issues a diagnostic if that's the case. A alternative PR, llvm#91478, relied on constexpr functions with host and device overloads (where the host overload is marked as deprecated) to diagnose uses of these macros in host code. A problem with this approach are uses of the macros outside of function bodies, e.g., in template arguments of return types, or default template arguments of functions. In these cases, calls to functions with target overloads are resolved to the host variant during host compilation and to the device variant during device compilation - independently of the target of the function they belong to. Therefore, using the wavefront size macros in such cases leads to diagnostics during host compilation with llvm#91478, even if they are only associated to a device function. PR llvm#93546 is a proposal to suppress these spurious diagnostics. PR llvm#103031 is a proposal to change the behavior of target-dependent overload resolution outside of function bodies to use the target attributes that occur before the overloaded call to select the overload candidate. In contrast to llvm#91478, this PR will not diagnose uses of the wavefront-size macros outside of function bodies or initializers of global host variables. Implements SWDEV-449015.

ritter-x2a · 2024-10-17T06:38:37Z

Closing this PR in favor of a more comprehensive treatment of the AMDGCN_WAVEFRONT_SIZE situation.

ritter-x2a requested review from Artem-B and yxsamliu August 13, 2024 09:41

ritter-x2a self-assigned this Aug 13, 2024

llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 13, 2024

Artem-B requested a review from zygoloid August 13, 2024 19:35

ritter-x2a added 2 commits August 19, 2024 08:15

fixup! [Clang][HIP] Target-dependent overload resolution in declarato…

4c1b605

…rs and specifiers Handle and test template functions outside of classes.

fixup! fixup! [Clang][HIP] Target-dependent overload resolution in de…

91deb04

…clarators and specifiers Add a test to document the behavior for default arguments of template parameters.

Artem-B reviewed Aug 19, 2024

View reviewed changes

fixup! fixup! fixup! [Clang][HIP] Target-dependent overload resolutio…

e273a99

…n in declarators and specifiers Rename 'target specifier' -> 'target attribute', move CUDATargetContext constructor in header.

ritter-x2a commented Aug 20, 2024

View reviewed changes

ritter-x2a mentioned this pull request Sep 23, 2024

[Clang][HIP] Warn when __AMDGCN_WAVEFRONT_SIZE is used in host code without relying on target-dependent overload resolution #109663

Closed

ritter-x2a closed this Oct 17, 2024

[Clang][HIP] Target-dependent overload resolution in declarators and specifiers #103031

[Clang][HIP] Target-dependent overload resolution in declarators and specifiers #103031

Uh oh!

Conversation

ritter-x2a commented Aug 13, 2024

Uh oh!

llvmbot commented Aug 13, 2024

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ritter-x2a commented Aug 26, 2024

Uh oh!

ritter-x2a commented Sep 4, 2024

Uh oh!

ritter-x2a commented Oct 17, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants