Skip to content

Conversation

@ritter-x2a
Copy link
Member

Outside of function bodies, the resolution of host/device overloads for functions in HIP/CUDA operates as if in a host-device context. This means that the device overload is used in the device compilation phase and the host overload is used in the host compilation phase.

Therefore, the following code would cause a deprecation warning during host compilation, even though val is only used as part of a device function:

__attribute__((host, deprecated)) constexpr int val(void) {return 1;}
__attribute__((device)) constexpr int val(void) {return 1;}
__attribute__((device)) std::enable_if<(val() > 0), int>::type fun(void) {
    return 42;
}

As only the available device overload is used during device compilation, where code for fun is actually generated, this diagnostic is spurious.

This patch suppresses availability diagnostics in such situations: When an unavailable host function is used in a device context during host compilation or when an unavailable device function is used in a host context during device compilation.

This change is necessary to avoid spurious warnings with #91478, e.g., in the rocPRIM library.

…vice overloads

Outside of function bodies, the resolution of host/device overloads for
functions in HIP/CUDA operates as if in a host-device context. This means that
the device overload is used in the device compilation phase and the host
overload is used in the host compilation phase.

Therefore, the following code would cause a deprecation warning during host
compilation, even though val is only used as part of a device function:

__attribute__((host, deprecated)) constexpr int val(void) {return 1;}
__attribute__((device)) constexpr int val(void) {return 1;}
__attribute__((device)) std::enable_if<(val() > 0), int>::type fun(void) {
    return 42;
}

As only the available device overload is used during device compilation, where
code for fun is actually generated, this diagnostic is spurious.

This patch suppresses availability diagnostics in such situations: When an
unavailable host function is used in a device context during host compilation
or when an unavailable device function is used in a host context during device
compilation.
@ritter-x2a ritter-x2a requested review from AlexVlx and yxsamliu May 28, 2024 13:27
@ritter-x2a ritter-x2a self-assigned this May 28, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:frontend Language frontend issues, e.g. anything involving "Sema" labels May 28, 2024
@llvmbot
Copy link
Member

llvmbot commented May 28, 2024

@llvm/pr-subscribers-clang

Author: Fabian Ritter (ritter-x2a)

Changes

Outside of function bodies, the resolution of host/device overloads for functions in HIP/CUDA operates as if in a host-device context. This means that the device overload is used in the device compilation phase and the host overload is used in the host compilation phase.

Therefore, the following code would cause a deprecation warning during host compilation, even though val is only used as part of a device function:

__attribute__((host, deprecated)) constexpr int val(void) {return 1;}
__attribute__((device)) constexpr int val(void) {return 1;}
__attribute__((device)) std::enable_if&lt;(val() &gt; 0), int&gt;::type fun(void) {
    return 42;
}

As only the available device overload is used during device compilation, where code for fun is actually generated, this diagnostic is spurious.

This patch suppresses availability diagnostics in such situations: When an unavailable host function is used in a device context during host compilation or when an unavailable device function is used in a host context during device compilation.

This change is necessary to avoid spurious warnings with #91478, e.g., in the rocPRIM library.


Full diff: https://github.com/llvm/llvm-project/pull/93546.diff

2 Files Affected:

  • (modified) clang/lib/Sema/SemaAvailability.cpp (+53)
  • (added) clang/test/SemaCUDA/suppress-availability-warnings-mismatched-attributes.cu (+149)
diff --git a/clang/lib/Sema/SemaAvailability.cpp b/clang/lib/Sema/SemaAvailability.cpp
index 22f5a2f663477..984789489098a 100644
--- a/clang/lib/Sema/SemaAvailability.cpp
+++ b/clang/lib/Sema/SemaAvailability.cpp
@@ -20,6 +20,7 @@
 #include "clang/Sema/DelayedDiagnostic.h"
 #include "clang/Sema/ScopeInfo.h"
 #include "clang/Sema/Sema.h"
+#include "clang/Sema/SemaCUDA.h"
 #include "clang/Sema/SemaObjC.h"
 #include "llvm/ADT/StringRef.h"
 #include <optional>
@@ -156,6 +157,58 @@ static bool ShouldDiagnoseAvailabilityInContext(
     }
   }
 
+  if (S.getLangOpts().CUDA || S.getLangOpts().HIP) {
+    // In CUDA/HIP, do not diagnose uses of unavailable host or device function
+    // overloads when they occur in the context of a Decl with an explicitly
+    // given opposite target.
+    // We encounter this if the OffendingDecl is used outside of a function
+    // body, e.g., in template arguments for a function's return or parameter
+    // types. In this case, overloads of the called function are resolved as if
+    // in a host-device context, i.e., the device overload is chosen in the
+    // device compilation phase and the host overload in the host compilation
+    // phase. As code is only generated for the variant with matching targets,
+    // an availabiliy diagnostic for the variant with non-matching targets would
+    // be spurious.
+
+    if (auto *OffendingFunDecl = llvm::dyn_cast<FunctionDecl>(OffendingDecl)) {
+      Decl *ActualCtx = Ctx;
+      if (auto *FTD = llvm::dyn_cast<FunctionTemplateDecl>(Ctx)) {
+        // Attributes of template Decls are only on the templated Decl
+        ActualCtx = FTD->getTemplatedDecl();
+      }
+      if (auto *CtxFun = llvm::dyn_cast<FunctionDecl>(ActualCtx)) {
+        auto TargetIs = [&S](const FunctionDecl *FD, CUDAFunctionTarget FT) {
+          return S.CUDA().IdentifyTarget(FD, /* IgnoreImplicitHDAttr */ true) ==
+                 FT;
+        };
+
+        bool CtxIsHost = TargetIs(CtxFun, CUDAFunctionTarget::Host);
+        bool CtxIsDevice = TargetIs(CtxFun, CUDAFunctionTarget::Device);
+
+        bool OffendingDeclIsHost =
+            TargetIs(OffendingFunDecl, CUDAFunctionTarget::Host);
+        bool OffendingDeclIsDevice =
+            TargetIs(OffendingFunDecl, CUDAFunctionTarget::Device);
+
+        // There is a way to call a device function from host code (and vice
+        // versa, analogously) that passes semantic analysis: As constexprs,
+        // when there is no host overload. In this case, a diagnostic is
+        // necessary. Characteristic for this situation is that the device
+        // function will also be used in a host context during host compilation.
+        // Therefore, only suppress diagnostics if a host function is used in a
+        // device context during host compilation or a device function is used
+        // in a host context during device compilation.
+        bool CompilingForDevice = S.getLangOpts().CUDAIsDevice;
+        bool CompilingForHost = !CompilingForDevice;
+
+        if ((OffendingDeclIsHost && CtxIsDevice && CompilingForHost) ||
+            (OffendingDeclIsDevice && CtxIsHost && CompilingForDevice)) {
+          return false;
+        }
+      }
+    }
+  }
+
   // Checks if we should emit the availability diagnostic in the context of C.
   auto CheckContext = [&](const Decl *C) {
     if (K == AR_NotYetIntroduced) {
diff --git a/clang/test/SemaCUDA/suppress-availability-warnings-mismatched-attributes.cu b/clang/test/SemaCUDA/suppress-availability-warnings-mismatched-attributes.cu
new file mode 100644
index 0000000000000..c3023d16565cf
--- /dev/null
+++ b/clang/test/SemaCUDA/suppress-availability-warnings-mismatched-attributes.cu
@@ -0,0 +1,149 @@
+// RUN: %clang_cc1 -triple x86_64-unknown-linux-gnu -fsyntax-only -verify=expected,onhost %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda -fsyntax-only -fcuda-is-device -verify=expected,ondevice %s
+
+template <bool C, class T = void> struct my_enable_if {};
+
+template <class T> struct my_enable_if<true, T> {
+  typedef T type;
+};
+
+__attribute__((host, device)) void use(int x);
+
+__attribute__((device)) constexpr int OverloadFunHostDepr(void) { return 1; }
+__attribute__((host, deprecated("Host variant"))) constexpr int OverloadFunHostDepr(void) { return 1; } // expected-note 0+ {{has been explicitly marked deprecated here}}
+
+
+__attribute__((device, deprecated("Device variant"))) constexpr int OverloadFunDeviceDepr(void) { return 1; } // expected-note 0+ {{has been explicitly marked deprecated here}}
+__attribute__((host)) constexpr int OverloadFunDeviceDepr(void) { return 1; }
+
+
+template<typename T>
+__attribute__((device)) constexpr T TemplateOverloadFun(void) { return 1; }
+
+template<typename T>
+__attribute__((host, deprecated("Host variant"))) constexpr T TemplateOverloadFun(void) { return 1; } // expected-note 0+ {{has been explicitly marked deprecated here}}
+
+
+__attribute__((device, deprecated)) constexpr int // expected-note 0+ {{has been explicitly marked deprecated here}}
+DeviceOnlyFunDeprecated(void) { return 1; }
+
+__attribute__((host, deprecated)) constexpr int // expected-note 0+ {{has been explicitly marked deprecated here}}
+HostOnlyFunDeprecated(void) { return 1; }
+
+class FunSelector {
+public:
+  template<int X> __attribute__((device))
+  auto devicefun(void) -> typename my_enable_if<(X == OverloadFunHostDepr()), int>::type {
+    return 1;
+  }
+
+  template<int X> __attribute__((device))
+  auto devicefun(void) -> typename my_enable_if<(X != OverloadFunHostDepr()), int>::type {
+      return 0;
+  }
+
+  template<int X> __attribute__((device))
+  auto devicefun_wrong(void) -> typename my_enable_if<(X == OverloadFunDeviceDepr()), int>::type { // ondevice-warning {{'OverloadFunDeviceDepr' is deprecated: Device variant}}
+    return 1;
+  }
+
+  template<int X> __attribute__((device))
+  auto devicefun_wrong(void) -> typename my_enable_if<(X != OverloadFunDeviceDepr()), int>::type { // ondevice-warning {{'OverloadFunDeviceDepr' is deprecated: Device variant}}
+      return 0;
+  }
+
+  template<int X> __attribute__((host))
+  auto hostfun(void) -> typename my_enable_if<(X == OverloadFunDeviceDepr()), int>::type {
+    return 1;
+  }
+
+  template<int X> __attribute__((host))
+  auto hostfun(void) -> typename my_enable_if<(X != OverloadFunDeviceDepr()), int>::type {
+      return 0;
+  }
+
+  template<int X> __attribute__((host))
+  auto hostfun_wrong(void) -> typename my_enable_if<(X == OverloadFunHostDepr()), int>::type { // onhost-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+    return 1;
+  }
+
+  template<int X> __attribute__((host))
+  auto hostfun_wrong(void) -> typename my_enable_if<(X != OverloadFunHostDepr()), int>::type { // onhost-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+      return 0;
+  }
+};
+
+
+// These should not be diagnosed since the device overload of
+// OverloadFunHostDepr is not deprecated:
+__attribute__((device)) my_enable_if<(OverloadFunHostDepr() > 0), int>::type
+DeviceUserOverloadFunHostDepr1(void) { return 2; }
+
+my_enable_if<(OverloadFunHostDepr() > 0), int>::type __attribute__((device))
+DeviceUserOverloadFunHostDepr2(void) { return 2; }
+
+__attribute__((device))
+my_enable_if<(OverloadFunHostDepr() > 0), int>::type constexpr
+DeviceUserOverloadFunHostDeprConstexpr(void) { return 2; }
+
+
+// Analogously for OverloadFunDeviceDepr:
+__attribute__((host)) my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type
+DeviceUserOverloadFunDeviceDepr1(void) { return 2; }
+
+my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type __attribute__((host))
+DeviceUserOverloadFunDeviceDepr2(void) { return 2; }
+
+__attribute__((host))
+my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type constexpr
+DeviceUserOverloadFunDeviceDeprConstexpr(void) { return 2; }
+
+
+// Actual uses of the deprecated overloads should be diagnosed:
+__attribute__((host, device)) my_enable_if<(OverloadFunHostDepr() > 0), int>::type // onhost-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+HostDeviceUserOverloadFunHostDepr(void) { return 3; }
+
+__attribute__((host)) my_enable_if<(OverloadFunHostDepr() > 0), int>::type constexpr // onhost-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+HostUserOverloadFunHostDeprConstexpr(void) { return 3; }
+
+__attribute__((device)) my_enable_if<(OverloadFunDeviceDepr() > 0), int>::type constexpr // ondevice-warning {{'OverloadFunDeviceDepr' is deprecated: Device variant}}
+HostUserOverloadFunDeviceDeprConstexpr(void) { return 3; }
+
+
+// Making the offending decl a template shouldn't change anything:
+__attribute__((host)) my_enable_if<(TemplateOverloadFun<int>() > 0), int>::type // onhost-warning {{'TemplateOverloadFun<int>' is deprecated: Host variant}}
+HostUserTemplateOverloadFun(void) { return 3; }
+
+__attribute__((device)) my_enable_if<(TemplateOverloadFun<int>() > 0), int>::type
+DeviceUserTemplateOverloadFun(void) { return 3; }
+
+
+// If the constexpr function is actually called from the mismatched context, diagnostics should be issued:
+__attribute__((host))
+my_enable_if<(DeviceOnlyFunDeprecated() > 0), int>::type constexpr // onhost-warning {{'DeviceOnlyFunDeprecated' is deprecated}}
+HostUserDeviceOnlyFunDeprecated(void) { return 3; }
+
+__attribute__((device))
+my_enable_if<(HostOnlyFunDeprecated() > 0), int>::type constexpr // ondevice-warning {{'HostOnlyFunDeprecated' is deprecated}}
+DeviceUserHostOnlyFunDeprecated(void) { return 3; }
+
+// Diagnostics for uses in function bodies should work as expected:
+__attribute__((device, deprecated)) constexpr int DeviceVarConstDepr = 1; // expected-note 0+ {{has been explicitly marked deprecated here}}
+
+__attribute__((host)) void HostUser(void) {
+  use(DeviceVarConstDepr); // expected-warning {{'DeviceVarConstDepr' is deprecated}}
+  use(HostOnlyFunDeprecated()); // expected-warning {{'HostOnlyFunDeprecated' is deprecated}}
+  use(OverloadFunHostDepr()); // expected-warning {{'OverloadFunHostDepr' is deprecated: Host variant}}
+  use(TemplateOverloadFun<int>()); // expected-warning {{'TemplateOverloadFun<int>' is deprecated: Host variant}}
+
+  use(OverloadFunDeviceDepr());
+}
+
+__attribute__((device)) void DeviceUser(void) {
+  use(DeviceVarConstDepr); // expected-warning {{'DeviceVarConstDepr' is deprecated}}
+  use(DeviceOnlyFunDeprecated()); // expected-warning {{'DeviceOnlyFunDeprecated' is deprecated}}
+  use(OverloadFunDeviceDepr()); // expected-warning {{'OverloadFunDeviceDepr' is deprecated: Device variant}}
+
+  use(OverloadFunHostDepr());
+  use(TemplateOverloadFun<int>());
+}

@yxsamliu yxsamliu requested a review from Artem-B May 28, 2024 13:51
@Artem-B
Copy link
Member

Artem-B commented May 28, 2024

Therefore, the following code would cause a deprecation warning during host compilation, even though val is only used as part of a device function:

This is where we may need help from @zygoloid.

attribute((device)) std::enable_if<(val() > 0), int>::type fun(void)

Here val() is evaluated in global context as it does not have a caller function. As such, overload resolution picking a host function during host compilation is a WAI, even if it happens to be part of a GPU-side function declaration.

In this case, function overload fails, and the patch tries to suppress such an error.

If we are adding a special case for handling overloads, perhaps a better approach would be to consider inferring the caller context from the enveloping function declaration attributes, and allow overload resolution to pick a device function instead. It would avoid the errors you're trying to suppress, and it will arguably make things more consistent -- the function declaration will have the same signature in both host and device compilations.

@zygoloid -- do you think such a change will create other issues?

@ritter-x2a
Copy link
Member Author

If we are adding a special case for handling overloads, perhaps a better approach would be to consider inferring the caller context from the enveloping function declaration attributes, and allow overload resolution to pick a device function instead. It would avoid the errors you're trying to suppress, and it will arguably make things more consistent -- the function declaration will have the same signature in both host and device compilations.

One problem I encountered when I looked into such a solution is that the host/device attribute(s) are not necessarily parsed at the time when the overload resolution in template arguments of the return type happens, because the __attribute__((device)) can come after the return type specifier (cf. the DeviceUserOverloadFunHostDepr2 function in the test case).
So far I haven't found an existing mechanism in clang to solve this without a bigger change that introduces some sort of backtracking. I'd be happy about pointers in that direction.

@ritter-x2a
Copy link
Member Author

Ping.

@yxsamliu
Copy link
Collaborator

yxsamliu commented Jun 5, 2024

Ping.

You situation is similar to overloading resolution of functions called in global variable initializer. You may consider using a similar approach as https://reviews.llvm.org/D158247

@ritter-x2a
Copy link
Member Author

ritter-x2a commented Jun 6, 2024

You situation is similar to overloading resolution of functions called in global variable initializer. You may consider using a similar approach as https://reviews.llvm.org/D158247

Thank you for your suggestion. When considering a similar approach as is used for global variable initializers, I found that I would need to set a context with the correct CUDAFunctionTarget before parsing the specifier of the function's return type, since that specifier can include template arguments with function calls.
This is problematic because relevant attributes may not have been parsed yet when the function's return type specifier is parsed, so that determining the correct CUDAFunctionTarget is not yet possible.

For example:

__attribute__((device)) constexpr int OverloadFunHostDepr(void) { return 1; }
__attribute__((host, deprecated("Host variant"))) constexpr int OverloadFunHostDepr(void) { return 1; }

// The device attribute here is parsed after the OverloadFunHostDepr call is resolved:
std::enable_if<(OverloadFunHostDepr() > 0), int>::type __attribute__((device))
foo(void) { return 2; }

As far as I can see, we would need to go in one of the following directions:

  • Disallow device/host/... target attributes after the type specifier, so that the available information at the type specifier is always sufficient. This seems problematic for backwards compatibility.
  • Resolve overloads with the information that is available, which means generating potentially different types depending on if the device attribute comes before or after the type specifier. This would enable surprising bugs for compiler users. We might be able to reduce such bugs by issuing a warning for encountered attributes that are not used for overload resolution.
  • Parse all attributes before overloads in the return type specifier are resolved, possibly by delaying overload resolution until the attributes are available. It is not clear to me whether this would be preferable over this PR as it would require a significant change to how clang parses function declarations.

Please let me know if I missed something, or if you are aware of a situation that is similar to the latter option, on which I could build a solution for this one.

@ritter-x2a ritter-x2a requested a review from zygoloid June 13, 2024 08:31
@ritter-x2a
Copy link
Member Author

Ping @zygoloid, your input would be highly appreciated. Thanks!

ritter-x2a added a commit to ritter-x2a/llvm-project that referenced this pull request Aug 13, 2024
…specifiers

So far, the resolution of host/device overloads for functions in HIP/CUDA
operates as if in a host-device context for code outside of function bodies,
e.g., in expressions that are part of template arguments in top-level
declarations. This means that, if separate host and device overloads are
declared, the device overload is used in the device compilation phase and the
host overload is used in the host compilation phase.

This patch changes overload resolution in such cases to prefer overloads that
match the target of the declaration in which they occur. For example:

__device__ constexpr int get_n() { return 64; }
__host__ constexpr int get_n() { return -1; }
__device__ std::enable_if<(get_n() > 32)>::type foo() { }

Before, this code would not compile, because get_n resolved to the host
overload during host compilation, causing an error. With this patch, the call
to get_n in the declaration of the device function foo resolves to the device
overload in host and device compilation.

If attributes that affect the declaration's target occur after a call with
target-dependent overload resolution, a warning is issued. This is realized by
registering the Kinds of relevant attributes in the CUDATargetContext when they
are parsed.

This is an alternative to PR llvm#93546, which is required for PR llvm#91478.
ritter-x2a added a commit to ritter-x2a/llvm-project that referenced this pull request Sep 23, 2024
…ithout relying on target-dependent overload resolution

The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can
only provide meaningful values during device compilation. They are currently
usable in host code, but only contain the default value of 64, independent of
the target device(s).

This patch checks for numeric literals in clearly identifiable host code if
they are the result of expanding the wavefront-size macros and issues a
diagnostic if that's the case.

A alternative PR, llvm#91478, relied on constexpr functions with host and device
overloads (where the host overload is marked as deprecated) to diagnose uses of
these macros in host code. A problem with this approach are uses of the macros
outside of function bodies, e.g., in template arguments of return types, or
default template arguments of functions. In these cases, calls to functions
with target overloads are resolved to the host variant during host compilation
and to the device variant during device compilation - independently of the
target of the function they belong to. Therefore, using the wavefront size
macros in such cases leads to diagnostics during host compilation with llvm#91478,
even if they are only associated to a device function.

PR llvm#93546 is a proposal to suppress these spurious diagnostics. PR llvm#103031 is a
proposal to change the behavior of target-dependent overload resolution outside
of function bodies to use the target attributes that occur before the
overloaded call to select the overload candidate.

In contrast to llvm#91478, this PR will not diagnose uses of the wavefront-size
macros outside of function bodies or initializers of global host variables.

Implements SWDEV-449015.
@ritter-x2a
Copy link
Member Author

Closing this PR in favor of a more comprehensive treatment of the AMDGCN_WAVEFRONT_SIZE situation.

@ritter-x2a ritter-x2a closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants