Skip to content

Conversation

@ritter-x2a
Copy link
Member

The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can only provide meaningful values during device compilation. They are currently usable in host code, but only contain the default value of 64, independent of the target device(s).

This patch redefines them during host compilation to issue a deprecation warning if the macros are used in host code. Their value during host compilation in actual HIP code as well as in preprocessing directives stays 64 as before. Macro uses in preprocessing directives are not diagnosed. Macro uses in device code are not affected.

In a later step, after a deprecation period, we can easily adjust this implementation so that macro uses in host code cause hard errors instead of warnings.

Considered Alternatives:

  • Introducing a specialized diagnostic during clang's semantic analysis:
    This is technically possible and allows for cleaner diagnostics, but requires HIP-specific special case handling in clang's very general Sema::ActOnNumericConstant(...) method, since these macros appear as integer literals during parsing/semantic analysis where we know if we are in a host function. In comparison, this PR introduces less complexity to code that is
    independent from HIP.

  • See also the previous rejected proposal, which eliminates the macros for host compilation: [clang][AMDGPU] Don't define feature macros on host code #83558

Implementation Rationale:

  • I have placed the macro redefinitions in a new header file so that it is included even if the -nogpuinc, -nobuiltininc, and/or -nostdinc CLI flags are provided, enabling consistent diagnostics with any combination of these flags. I am open to suggestions for better solutions.
  • The constexpr function with separate overloads for host and device is a HIP feature that allows us to identify macro uses in host code without special-case handling in the semantic analysis. Their returned value is irrelevant, they are only referenced for the deprecation warning. Constexpr variables cannot be overloaded like this.
  • The AMDGCN_WAVEFRONT_SIZE macros are commonly used in preprocessing directives for conditional includes. The defined expression is carefully crafted to not break this use case:
    • Calling the constexpr function instead of referencing its value as a function pointer would be diagnosed as an undefined function-like macro by the preprocessor in directives.
    • Using the more natural comma operator instead of the ternary conditional operator to discard the value of the constexpr function in the expression is illegal in constant expressions that may occur in preprocessing directives according to the Standard (e.g., the C11 Standard, Section 6.6 "Constant expressions", paragraph 3: "Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.") Clang diagnoses this with -pedantic.
    • In preprocessing directives, the function identifier is considered an undefined macro, which is interpreted as 0.

Implements SWDEV-449015.

@ritter-x2a ritter-x2a requested review from AlexVlx and kzhuravl May 8, 2024 14:27
@ritter-x2a ritter-x2a self-assigned this May 8, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU backend:X86 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:headers Headers provided by Clang, e.g. for intrinsics labels May 8, 2024
@llvmbot
Copy link
Member

llvmbot commented May 8, 2024

@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-clang
@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-clang-driver

Author: Fabian Ritter (ritter-x2a)

Changes

The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can only provide meaningful values during device compilation. They are currently usable in host code, but only contain the default value of 64, independent of the target device(s).

This patch redefines them during host compilation to issue a deprecation warning if the macros are used in host code. Their value during host compilation in actual HIP code as well as in preprocessing directives stays 64 as before. Macro uses in preprocessing directives are not diagnosed. Macro uses in device code are not affected.

In a later step, after a deprecation period, we can easily adjust this implementation so that macro uses in host code cause hard errors instead of warnings.

Considered Alternatives:

  • Introducing a specialized diagnostic during clang's semantic analysis:
    This is technically possible and allows for cleaner diagnostics, but requires HIP-specific special case handling in clang's very general Sema::ActOnNumericConstant(...) method, since these macros appear as integer literals during parsing/semantic analysis where we know if we are in a host function. In comparison, this PR introduces less complexity to code that is
    independent from HIP.

  • See also the previous rejected proposal, which eliminates the macros for host compilation: [clang][AMDGPU] Don't define feature macros on host code #83558

Implementation Rationale:

  • I have placed the macro redefinitions in a new header file so that it is included even if the -nogpuinc, -nobuiltininc, and/or -nostdinc CLI flags are provided, enabling consistent diagnostics with any combination of these flags. I am open to suggestions for better solutions.
  • The constexpr function with separate overloads for host and device is a HIP feature that allows us to identify macro uses in host code without special-case handling in the semantic analysis. Their returned value is irrelevant, they are only referenced for the deprecation warning. Constexpr variables cannot be overloaded like this.
  • The AMDGCN_WAVEFRONT_SIZE macros are commonly used in preprocessing directives for conditional includes. The defined expression is carefully crafted to not break this use case:
    • Calling the constexpr function instead of referencing its value as a function pointer would be diagnosed as an undefined function-like macro by the preprocessor in directives.
    • Using the more natural comma operator instead of the ternary conditional operator to discard the value of the constexpr function in the expression is illegal in constant expressions that may occur in preprocessing directives according to the Standard (e.g., the C11 Standard, Section 6.6 "Constant expressions", paragraph 3: "Constant expressions shall not contain assignment, increment, decrement, function-call, or comma operators, except when they are contained within a subexpression that is not evaluated.") Clang diagnoses this with -pedantic.
    • In preprocessing directives, the function identifier is considered an undefined macro, which is interpreted as 0.

Implements SWDEV-449015.


Full diff: https://github.com/llvm/llvm-project/pull/91478.diff

5 Files Affected:

  • (modified) clang/lib/Driver/ToolChains/AMDGPU.cpp (+11)
  • (modified) clang/lib/Headers/CMakeLists.txt (+1)
  • (added) clang/lib/Headers/__clang_hip_device_macro_guards.h (+55)
  • (added) clang/test/Driver/hip-wavefront-size-host-diagnostics.hip (+52)
  • (modified) clang/test/Preprocessor/predefined-arch-macros.c (-1)
diff --git a/clang/lib/Driver/ToolChains/AMDGPU.cpp b/clang/lib/Driver/ToolChains/AMDGPU.cpp
index 07965b487ea79..587aa19349d89 100644
--- a/clang/lib/Driver/ToolChains/AMDGPU.cpp
+++ b/clang/lib/Driver/ToolChains/AMDGPU.cpp
@@ -550,6 +550,17 @@ void RocmInstallationDetector::AddHIPIncludeArgs(const ArgList &DriverArgs,
     CC1Args.push_back(DriverArgs.MakeArgString(P));
   }
 
+  {
+    // This header implements diagnostics for problematic uses of
+    // device-specific macros. Since these diagnostics should be issued even
+    // when GPU headers are not included, this header is included separately.
+    SmallString<128> P(D.ResourceDir);
+    llvm::sys::path::append(P, "include");
+    CC1Args.push_back("-internal-isystem");
+    CC1Args.push_back(DriverArgs.MakeArgString(P));
+    CC1Args.append({"-include", "__clang_hip_device_macro_guards.h"});
+  }
+
   const auto HandleHipStdPar = [=, &DriverArgs, &CC1Args]() {
     StringRef Inc = getIncludePath();
     auto &FS = D.getVFS();
diff --git a/clang/lib/Headers/CMakeLists.txt b/clang/lib/Headers/CMakeLists.txt
index 5f02c71f6ca51..31f1a73fee66a 100644
--- a/clang/lib/Headers/CMakeLists.txt
+++ b/clang/lib/Headers/CMakeLists.txt
@@ -79,6 +79,7 @@ set(hip_files
   __clang_hip_math.h
   __clang_hip_stdlib.h
   __clang_hip_runtime_wrapper.h
+  __clang_hip_device_macro_guards.h
   )
 
 set(hlsl_h
diff --git a/clang/lib/Headers/__clang_hip_device_macro_guards.h b/clang/lib/Headers/__clang_hip_device_macro_guards.h
new file mode 100644
index 0000000000000..42782c9bb08a7
--- /dev/null
+++ b/clang/lib/Headers/__clang_hip_device_macro_guards.h
@@ -0,0 +1,55 @@
+/*===---- __clang_hip_device_macro_guards.h - guards for HIP device macros -===
+ *
+ * Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+ * See https://llvm.org/LICENSE.txt for license information.
+ * SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+ *
+ *===-----------------------------------------------------------------------===
+ */
+
+/*
+ * WARNING: This header is intended to be directly -include'd by
+ * the compiler and is not supposed to be included by users.
+ *
+ */
+
+#ifndef __CLANG_HIP_DEVICE_MACRO_GUARDS_H__
+#define __CLANG_HIP_DEVICE_MACRO_GUARDS_H__
+
+#if __HIP__
+#if !defined(__HIP_DEVICE_COMPILE__)
+// The __AMDGCN_WAVEFRONT_SIZE macros cannot hold meaningful values during host
+// compilation as devices are not initialized when the macros are defined and
+// there may indeed be devices with differing wavefront sizes in the same
+// system. This code issues diagnostics when the macros are used in host code.
+
+#undef __AMDGCN_WAVEFRONT_SIZE
+#undef __AMDGCN_WAVEFRONT_SIZE__
+
+// Reference __hip_device_macro_guard in a way that is legal in preprocessor
+// directives and does not affect the value so that appropriate diagnostics are
+// issued. Function calls, casts, or the comma operator would make the macro
+// illegal for use in preprocessor directives.
+#define __AMDGCN_WAVEFRONT_SIZE (!__hip_device_macro_guard ? 64 : 64)
+#define __AMDGCN_WAVEFRONT_SIZE__ (!__hip_device_macro_guard ? 64 : 64)
+
+// This function is referenced by the macro in device functions during host
+// compilation, it SHOULD NOT cause a diagnostic.
+__attribute__((device)) static constexpr int __hip_device_macro_guard(void) {
+  return -1;
+}
+
+// This function is referenced by the macro in host functions during host
+// compilation, it SHOULD cause a diagnostic.
+__attribute__((
+    host, deprecated("The __AMDGCN_WAVEFRONT_SIZE macros do not correspond "
+                     "to the device(s) when used in host code and may only "
+                     "be used in device code."))) static constexpr int
+__hip_device_macro_guard(void) {
+  return -1;
+}
+// TODO Change "deprecated" to "unavailable" to cause hard errors instead of
+// warnings.
+#endif
+#endif // __HIP__
+#endif // __CLANG_HIP_DEVICE_MACRO_GUARDS_H__
diff --git a/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip b/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip
new file mode 100644
index 0000000000000..e0ee44cdc2986
--- /dev/null
+++ b/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip
@@ -0,0 +1,52 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang -xhip --offload-arch=gfx1030 --offload-host-only -pedantic -nogpuinc -nogpulib -nobuiltininc -nostdinc -fsyntax-only -Xclang -verify=onhost %s
+// RUN: %clang -xhip --offload-arch=gfx1030 --offload-device-only -pedantic -nogpuinc -nogpulib -nobuiltininc -nostdinc -fsyntax-only -Xclang -verify=ondevice %s
+
+// ondevice-no-diagnostics
+
+#define WRAPPED __AMDGCN_WAVEFRONT_SIZE__
+
+__attribute__((host, device)) void use(int, const char*);
+
+template<int N> __attribute__((host, device)) int templatify(int x) {
+    return x + N;
+}
+
+// no warning expected
+#if defined(__HIP_DEVICE_COMPILE__) && (__AMDGCN_WAVEFRONT_SIZE__ == 64) && (__AMDGCN_WAVEFRONT_SIZE == 64)
+int foo(void);
+#endif
+
+// no warning expected
+__attribute__((device)) int device_var = __AMDGCN_WAVEFRONT_SIZE__;
+
+__attribute__((device))
+void device_fun() {
+    // no warnings expected
+    use(__AMDGCN_WAVEFRONT_SIZE, "device function");
+    use(__AMDGCN_WAVEFRONT_SIZE__, "device function");
+    use(WRAPPED, "device function");
+    use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "device function");
+}
+
+// warning expected
+int host_var = __AMDGCN_WAVEFRONT_SIZE__;  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+
+__attribute__((host))
+void host_fun() {
+    // warnings expected
+    use(__AMDGCN_WAVEFRONT_SIZE, "host function");  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+    use(__AMDGCN_WAVEFRONT_SIZE__, "host function");  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+    use(WRAPPED, "host function");  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+    use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host function");  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+}
+
+__attribute((host, device))
+void host_device_fun() {
+    // warnings expected
+    use(__AMDGCN_WAVEFRONT_SIZE__, "host device function");  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+    use(WRAPPED, "host device function");  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+    use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host device function");  // onhost-warning {{'__hip_device_macro_guard' is deprecated: The __AMDGCN_WAVEFRONT_SIZE macros do not correspond to the device(s) when used in host code and may only be used in device code.}}
+}
+
+// onhost-note@__clang_hip_device_macro_guards.h:45 0+ {{'__hip_device_macro_guard' has been explicitly marked deprecated here}}
diff --git a/clang/test/Preprocessor/predefined-arch-macros.c b/clang/test/Preprocessor/predefined-arch-macros.c
index ca51f2fc22c51..ee3e26f203964 100644
--- a/clang/test/Preprocessor/predefined-arch-macros.c
+++ b/clang/test/Preprocessor/predefined-arch-macros.c
@@ -4340,7 +4340,6 @@
 // RUN: %clang -x hip -E -dM %s -o - 2>&1 --offload-host-only -nogpulib \
 // RUN:     -nogpuinc --offload-arch=gfx803 -target x86_64-unknown-linux \
 // RUN:   | FileCheck -match-full-lines %s -check-prefixes=CHECK_HIP_HOST
-// CHECK_HIP_HOST: #define __AMDGCN_WAVEFRONT_SIZE__ 64
 // CHECK_HIP_HOST: #define __AMDGPU__ 1
 // CHECK_HIP_HOST: #define __AMD__ 1
 

Copy link
Contributor

@AlexVlx AlexVlx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fine to me (I do hate that macro being defined on the host though, so I am biased). Thanks!

__attribute__((
host, deprecated("The __AMDGCN_WAVEFRONT_SIZE macros do not correspond "
"to the device(s) when used in host code and may only "
"be used in device code."))) static constexpr int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought I saw some junk trying to support pre-C++11 HIP, is that a concern here?

Is this macro defined in OpenMP? If so can we do the same thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re pre-C++11 HIP: I think we can just drop the constexpr from both variants of the guard function; since the guard function is only referenced and never called, the macros would still work as constant expressions.

re OpenMP: As far as I can see in experiments, the macros are not defined during OpenMP's host compilation. This is therefore not an issue for OpenMP.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pre-C++11 HIP shouldn't be a concern anymore.

@arsenm arsenm requested a review from scchan May 10, 2024 12:40
Copy link
Contributor

@scchan scchan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

ritter-x2a added a commit to ritter-x2a/llvm-project that referenced this pull request Aug 13, 2024
…specifiers

So far, the resolution of host/device overloads for functions in HIP/CUDA
operates as if in a host-device context for code outside of function bodies,
e.g., in expressions that are part of template arguments in top-level
declarations. This means that, if separate host and device overloads are
declared, the device overload is used in the device compilation phase and the
host overload is used in the host compilation phase.

This patch changes overload resolution in such cases to prefer overloads that
match the target of the declaration in which they occur. For example:

__device__ constexpr int get_n() { return 64; }
__host__ constexpr int get_n() { return -1; }
__device__ std::enable_if<(get_n() > 32)>::type foo() { }

Before, this code would not compile, because get_n resolved to the host
overload during host compilation, causing an error. With this patch, the call
to get_n in the declaration of the device function foo resolves to the device
overload in host and device compilation.

If attributes that affect the declaration's target occur after a call with
target-dependent overload resolution, a warning is issued. This is realized by
registering the Kinds of relevant attributes in the CUDATargetContext when they
are parsed.

This is an alternative to PR llvm#93546, which is required for PR llvm#91478.
ritter-x2a added a commit to ritter-x2a/llvm-project that referenced this pull request Sep 23, 2024
…ithout relying on target-dependent overload resolution

The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can
only provide meaningful values during device compilation. They are currently
usable in host code, but only contain the default value of 64, independent of
the target device(s).

This patch checks for numeric literals in clearly identifiable host code if
they are the result of expanding the wavefront-size macros and issues a
diagnostic if that's the case.

A alternative PR, llvm#91478, relied on constexpr functions with host and device
overloads (where the host overload is marked as deprecated) to diagnose uses of
these macros in host code. A problem with this approach are uses of the macros
outside of function bodies, e.g., in template arguments of return types, or
default template arguments of functions. In these cases, calls to functions
with target overloads are resolved to the host variant during host compilation
and to the device variant during device compilation - independently of the
target of the function they belong to. Therefore, using the wavefront size
macros in such cases leads to diagnostics during host compilation with llvm#91478,
even if they are only associated to a device function.

PR llvm#93546 is a proposal to suppress these spurious diagnostics. PR llvm#103031 is a
proposal to change the behavior of target-dependent overload resolution outside
of function bodies to use the target attributes that occur before the
overloaded call to select the overload candidate.

In contrast to llvm#91478, this PR will not diagnose uses of the wavefront-size
macros outside of function bodies or initializers of global host variables.

Implements SWDEV-449015.
The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can
only provide meaningful values during device compilation. They are currently
usable in host code, but only contain the default value of 64, independent of
the target device(s).
This patch redefines them during host compilation to issue a deprecation
warning if the macros are used in host code. Their value during host
compilation in actual HIP code as well as in preprocessing directives stays 64
as before. Macro uses in preprocessing directives are not diagnosed. Macro uses
in device code are not affected.

Implements SWDEV-449015.
@ritter-x2a ritter-x2a force-pushed the wavefront-warnings-in-header branch from 8743f8a to 4145231 Compare October 7, 2024 09:30
@ritter-x2a
Copy link
Member Author

Closing this PR in favor of a more comprehensive treatment of the AMDGCN_WAVEFRONT_SIZE situation.

@ritter-x2a ritter-x2a closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU backend:X86 clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:headers Headers provided by Clang, e.g. for intrinsics clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants