Skip to content

Conversation

@ritter-x2a
Copy link
Member

@ritter-x2a ritter-x2a commented Sep 23, 2024

This is a proposal for an alternative to PR #91478 that would make PRs #93546 and #103031 unnecessary.

The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can only provide meaningful values during device compilation. They are currently usable in host code, but only contain the default value of 64, independent of the target device(s).

This patch checks for numeric literals in clearly identifiable host code if they are the result of expanding the wavefront-size macros and issues a diagnostic if that's the case.

The alternative PR, #91478, relied on constexpr functions with host and device overloads (where the host overload is marked as deprecated) to diagnose uses of these macros in host code. A problem with this approach are uses of the macros outside of function bodies, e.g., in template arguments of return types, or default template arguments of functions. In these cases, calls to functions with target overloads are resolved to the host variant during host compilation and to the device variant during device compilation - independently of the target of the function they belong to. Therefore, using the wavefront size macros in such cases leads to diagnostics during host compilation with #91478, even if they are only associated to a device function.

PR #93546 is a proposal to suppress these spurious diagnostics. PR #103031 is an - as of yet insufficient - proposal to change the behavior of target-dependent overload resolution outside of function bodies to use the target attributes that occur before the overloaded call to select the overload candidate.

In contrast to #91478, this PR will not diagnose uses of the wavefront-size macros outside of function bodies or initializers of global host variables.

Implements SWDEV-449015.

…ithout relying on target-dependent overload resolution

The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can
only provide meaningful values during device compilation. They are currently
usable in host code, but only contain the default value of 64, independent of
the target device(s).

This patch checks for numeric literals in clearly identifiable host code if
they are the result of expanding the wavefront-size macros and issues a
diagnostic if that's the case.

A alternative PR, llvm#91478, relied on constexpr functions with host and device
overloads (where the host overload is marked as deprecated) to diagnose uses of
these macros in host code. A problem with this approach are uses of the macros
outside of function bodies, e.g., in template arguments of return types, or
default template arguments of functions. In these cases, calls to functions
with target overloads are resolved to the host variant during host compilation
and to the device variant during device compilation - independently of the
target of the function they belong to. Therefore, using the wavefront size
macros in such cases leads to diagnostics during host compilation with llvm#91478,
even if they are only associated to a device function.

PR llvm#93546 is a proposal to suppress these spurious diagnostics. PR llvm#103031 is a
proposal to change the behavior of target-dependent overload resolution outside
of function bodies to use the target attributes that occur before the
overloaded call to select the overload candidate.

In contrast to llvm#91478, this PR will not diagnose uses of the wavefront-size
macros outside of function bodies or initializers of global host variables.

Implements SWDEV-449015.
@ritter-x2a ritter-x2a self-assigned this Sep 23, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Sep 23, 2024
@llvmbot
Copy link
Member

llvmbot commented Sep 23, 2024

@llvm/pr-subscribers-clang-driver

@llvm/pr-subscribers-clang

Author: Fabian Ritter (ritter-x2a)

Changes

This is a proposal for an alternative to PR #91478 that would make PRs #93546 and #103031 unnecessary. Please let me know if this one is preferrable over PRs #91478 and #103031.

The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can only provide meaningful values during device compilation. They are currently usable in host code, but only contain the default value of 64, independent of the target device(s).

This patch checks for numeric literals in clearly identifiable host code if they are the result of expanding the wavefront-size macros and issues a diagnostic if that's the case.

The alternative PR, #91478, relied on constexpr functions with host and device overloads (where the host overload is marked as deprecated) to diagnose uses of these macros in host code. A problem with this approach are uses of the macros outside of function bodies, e.g., in template arguments of return types, or default template arguments of functions. In these cases, calls to functions with target overloads are resolved to the host variant during host compilation and to the device variant during device compilation - independently of the target of the function they belong to. Therefore, using the wavefront size macros in such cases leads to diagnostics during host compilation with #91478, even if they are only associated to a device function.

PR #93546 is a proposal to suppress these spurious diagnostics. PR #103031 is a proposal to change the behavior of target-dependent overload resolution outside of function bodies to use the target attributes that occur before the overloaded call to select the overload candidate.

In contrast to #91478, this PR will not diagnose uses of the wavefront-size macros outside of function bodies or initializers of global host variables.

Implements SWDEV-449015.


Full diff: https://github.com/llvm/llvm-project/pull/109663.diff

5 Files Affected:

  • (modified) clang/include/clang/Basic/DiagnosticSemaKinds.td (+2)
  • (modified) clang/include/clang/Sema/SemaCUDA.h (+4)
  • (modified) clang/lib/Sema/SemaCUDA.cpp (+39)
  • (modified) clang/lib/Sema/SemaExpr.cpp (+3)
  • (added) clang/test/Driver/hip-wavefront-size-host-diagnostics.hip (+109)
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index e4e04bff8b5120..557d2803021f60 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9109,6 +9109,8 @@ def warn_offload_incompatible_redeclare : Warning<
   "new declaration is %select{__device__|__global__|__host__|__host__ __device__}0 function, "
   "old declaration is %select{__device__|__global__|__host__|__host__ __device__}1 function">,
   InGroup<DiagGroup<"nvcc-compat">>, DefaultIgnore;
+def warn_ref_device_macro_on_host : Warning<
+  "device-specific macro %0 is not available in a %select{__device__|__global__|__host__|__host__ __device__}1 context">, InGroup<DiagGroup<"hip-wavefrontsize">>;
 
 def err_cuda_device_builtin_surftex_cls_template : Error<
     "illegal device builtin %select{surface|texture}0 reference "
diff --git a/clang/include/clang/Sema/SemaCUDA.h b/clang/include/clang/Sema/SemaCUDA.h
index 71f05e88fb539c..80b8dc24664b68 100644
--- a/clang/include/clang/Sema/SemaCUDA.h
+++ b/clang/include/clang/Sema/SemaCUDA.h
@@ -263,6 +263,10 @@ class SemaCUDA : public SemaBase {
   // for __constant__ and __device__ variables.
   void checkAllowedInitializer(VarDecl *VD);
 
+  /// Check if the token is part of a macro that is used outside of its allowed
+  /// compilation mode.
+  void checkTargetMacroUse(const Token &Tok);
+
   /// Check whether NewFD is a valid overload for CUDA. Emits
   /// diagnostics and invalidates NewFD if not.
   void checkTargetOverload(FunctionDecl *NewFD, const LookupResult &Previous);
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index fbb3de4b3e4165..b09319bbd894d4 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -703,6 +703,45 @@ void SemaCUDA::checkAllowedInitializer(VarDecl *VD) {
   }
 }
 
+void SemaCUDA::checkTargetMacroUse(const Token &Tok) {
+  assert(SemaRef.LangOpts.HIP);
+
+  // Currently, we check only for the AMDGCN_WAVEFRONT_SIZE macros, which should
+  // only be used in device compilation.
+  if (SemaRef.LangOpts.CUDAIsDevice)
+    return;
+
+  auto *FD = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);
+  // If we are not in a FunctionDecl and we have no other meaningful way of
+  // determining the compilation mode, avoid potentially spurious warnings.
+  if (!FD && SemaRef.CUDA().CurCUDATargetCtx.Kind == SemaCUDA::CTCK_Unknown)
+    return;
+
+  auto Target = SemaRef.CUDA().IdentifyTarget(FD);
+  if (Target != CUDAFunctionTarget::HostDevice &&
+      Target != CUDAFunctionTarget::Host)
+    return;
+
+  const auto &Loc = Tok.getLocation();
+  if (!Loc.isMacroID())
+    return;
+
+  // Get the location of the innermost macro that contributed the token.
+  const auto &SM = SemaRef.getSourceManager();
+  const auto &IMCLoc = SM.getImmediateMacroCallerLoc(Loc);
+  const auto &SpellingLoc = SM.getSpellingLoc(IMCLoc);
+
+  SmallString<16> buffer;
+  auto MacroName = SemaRef.getPreprocessor().getSpelling(SpellingLoc, buffer);
+  if (MacroName == "__AMDGCN_WAVEFRONT_SIZE" ||
+      MacroName == "__AMDGCN_WAVEFRONT_SIZE__") {
+    // Only report the actual use of the macro, not its builtin definition.
+    auto UseLoc = SM.getExpansionLoc(Tok.getLocation());
+    SemaRef.Diag(UseLoc, diag::warn_ref_device_macro_on_host)
+        << MacroName << llvm::to_underlying(SemaRef.CUDA().CurrentTarget());
+  }
+}
+
 void SemaCUDA::RecordImplicitHostDeviceFuncUsedByDevice(
     const FunctionDecl *Callee) {
   FunctionDecl *Caller = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 66df9c969256a2..4c7178fb8f5205 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -4079,6 +4079,9 @@ ExprResult Sema::ActOnNumericConstant(const Token &Tok, Scope *UDLScope) {
         ResultVal = ResultVal.trunc(Width);
     }
     Res = IntegerLiteral::Create(Context, ResultVal, Ty, Tok.getLocation());
+
+    if (SemaRef.LangOpts.HIP)
+      SemaRef.CUDA().checkTargetMacroUse(Tok);
   }
 
   // If this is an imaginary literal, create the ImaginaryLiteral wrapper.
diff --git a/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip b/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip
new file mode 100644
index 00000000000000..3bde9730ccb0d6
--- /dev/null
+++ b/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip
@@ -0,0 +1,109 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang -xhip --offload-arch=gfx1030 --offload-host-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify=onhost %s
+// RUN: %clang -xhip --offload-arch=gfx1030 --offload-device-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify=ondevice %s
+
+// ondevice-no-diagnostics
+
+#include <type_traits>
+
+#define WRAPPED __AMDGCN_WAVEFRONT_SIZE__
+
+#define DOUBLE_WRAPPED (WRAPPED)
+
+__attribute__((host, device)) void use(int, const char*);
+
+template<int N> __attribute__((host, device)) int templatify(int x) {
+    return x + N;
+}
+
+// no warning expected
+#if defined(__HIP_DEVICE_COMPILE__) && (__AMDGCN_WAVEFRONT_SIZE__ == 64) && (__AMDGCN_WAVEFRONT_SIZE == 64)
+int foo(void);
+#endif
+
+// no warning expected
+__attribute__((device)) int device_var = __AMDGCN_WAVEFRONT_SIZE__;
+
+__attribute__((device))
+void device_fun() {
+    // no warnings expected
+    use(__AMDGCN_WAVEFRONT_SIZE, "device function");
+    use(__AMDGCN_WAVEFRONT_SIZE__, "device function");
+    use(WRAPPED, "device function");
+    use(DOUBLE_WRAPPED, "device function");
+    use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "device function");
+}
+
+__attribute__((global))
+void global_fun() {
+    // no warnings expected
+    use(__AMDGCN_WAVEFRONT_SIZE, "global function");
+    use(__AMDGCN_WAVEFRONT_SIZE__, "global function");
+    use(WRAPPED, "global function");
+    use(DOUBLE_WRAPPED, "global function");
+    use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "global function");
+}
+
+// warning expected
+int host_var = __AMDGCN_WAVEFRONT_SIZE__;  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+int host_var_alt = __AMDGCN_WAVEFRONT_SIZE;  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE is not available in a __host__ context}}
+int host_var_wrapped = WRAPPED;  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+int host_var_double_wrapped = DOUBLE_WRAPPED;  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+
+__attribute__((host))
+void host_fun() {
+    // warnings expected
+    use(__AMDGCN_WAVEFRONT_SIZE, "host function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE is not available in a __host__ context}}
+    use(__AMDGCN_WAVEFRONT_SIZE__, "host function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+    use(WRAPPED, "host function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+    use(DOUBLE_WRAPPED, "host function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+    use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+}
+
+__attribute((host, device))
+void host_device_fun() {
+    // warnings expected
+    use(__AMDGCN_WAVEFRONT_SIZE__, "host device function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+    use(WRAPPED, "host device function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+    use(DOUBLE_WRAPPED, "host device function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+    use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host device function");  // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+}
+
+// Variations of this construct are used in rocPRIM and should compile without diagnostics.
+template <unsigned int OuterWarpSize = __AMDGCN_WAVEFRONT_SIZE>
+class FunSelector {
+public:
+    template<unsigned int FunWarpSize = OuterWarpSize>
+    __attribute__((device))
+    auto fun(void)
+        -> typename std::enable_if<(FunWarpSize <= __AMDGCN_WAVEFRONT_SIZE), void>::type
+    {
+        use(1, "yay!");
+    }
+
+    template<unsigned int FunWarpSize = OuterWarpSize>
+    __attribute__((device))
+    auto fun(void)
+        -> typename std::enable_if<(FunWarpSize > __AMDGCN_WAVEFRONT_SIZE), void>::type
+    {
+        use(0, "nay!");
+    }
+};
+
+__attribute__((device))
+void device_fun_selector_user() {
+    FunSelector<> f;
+    f.fun<>();
+    f.fun<1>();
+    f.fun<1000>();
+
+    std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE), int>::type x = 42;
+}
+
+__attribute__((device)) std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE), int>::type DeviceFunTemplateRet(void) {
+    return 42;
+}
+
+__attribute__((device)) int DeviceFunTemplateArg(std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE), int>::type x) {
+    return x;
+}

@ritter-x2a
Copy link
Member Author

Ping.

}

// warning expected
int host_var = __AMDGCN_WAVEFRONT_SIZE__; // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will const or constexpr host variables dependent on the macros also produce warnings?
E.g. something like this https://godbolt.org/z/1bxnrxrnn may be OK:

const int z = __AMDGCN_WAVEFRONT_SIZE;
__global__ void kernel(int* array, int n) {
  do_something_with(z);
}

The use of z on the host side would still be wrong, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this patch, the initializer of z would cause a warning. Is that not the intended behavior?
The const variable will only have a meaningful value during device compilation and therefore should only be used in a device context, so it should be declared with __device__ to avoid the warning, right?

@Artem-B
Copy link
Member

Artem-B commented Sep 30, 2024

I'm curious why are those macros even defined on the host?

It looks like these macros should be handled in a way similar to __HIP_ARCH__.

@ritter-x2a
Copy link
Member Author

@Artem-B

I'm curious why are those macros even defined on the host?

It looks like these macros should be handled in a way similar to __HIP_ARCH__.

Those macros are defined on the host (with a default value of 64) to allow using them in preprocessor directives and template arguments outside of function bodies and variable initializers without parsing errors during host compilation.
See this previous pull request for a longer discussion on why we can't just not define the macro on host: #83558

From what I can tell, the __HIP_ARCH_* feature test macros are handled similarly to the wavefront size macros in that they can (but should not) be used in host code without diagnostic, with a default of 0.
They are just defined in the hip runtime headers instead of being built into clang.
The patch here could be extended to also warn for host uses of these, if we want to handle them more similarly.

@yxsamliu
Copy link
Collaborator

yxsamliu commented Oct 1, 2024

__AMDGCN_WAVEFRONT_SIZE__ could also be used in other offloading languages e.g. OpenMP. The check for non-constantness is not specific to HIP.

It is a constant when the triple is amdgcn and -target-cpu is specified. Otherwise it should not be treated as constant.

I think you could refactor the code to introduce one more function Sema::isMacroTargetSpecificConstant, which returns whether a macro is constant based on triple and CPU. Then diagnose it in HIP like the current patch does. In this way, the check may be used by other languages.

@Artem-B
Copy link
Member

Artem-B commented Oct 1, 2024

This patch checks for numeric literals in clearly identifiable host code if they are the result of expanding the wavefront-size macros and issues a diagnostic if that's the case.

What's the ultimate goal here? If we're OK to warn on some obvious misuses, then it may do.

However, there will be cases where we're unlikely to tell whether the use of the macro is a problem or not.
E.g. it could be used in an arbitrary long chain of constexpr functions and or template instantiations that we're unlikely to see through.

One way to give users a sensibly safe way to get wavefront info may look roughly like this:

  • __AMDGCN_WAVEFRONT_SIZE__, depending on the compilation options would provide different implementations:
    • current one -- 64 on the host, GPU-specific on the device.
    • checked one. GPU-specific on the device, but on the host it would provide the value matching GPU-side if all targeted GPUs agree, or something that would provide diagnostics on use (constexpr function w/ deprecated attribute?).
  • provide another macro __AMDGCN_WAVEFRONT_SIZE_IS_64_ON_THE_HOST__ which will match the default current behavior unconditionally.

If we want to find problems in the user's code, flip the knob to use the checked macro variant, compile user code, and find potentially problematic places.
If some of them are OK to use with the host-has-wavefront-64, replace them with the macro that clearly indicates that.

Eventually flip the flag to be on by default, which would prevent future misuse of the __AMDGCN_WAVEFRONT_SIZE__ on the host.

@ritter-x2a
Copy link
Member Author

@Artem-B thank you for the suggestion!
Do I understand correctly that it boils down to the following?

  • If a (new) clang CLI option is set: enable PR [Clang][HIP] Warn when __AMDGCN_WAVEFRONT_SIZE is used in host code #91478 - which, among other things, warns during host compilation whenever __AMDGCN_WAVEFRONT_SIZE__ is used in template default arguments or template arguments outside of function bodies and global initializers,
  • provide __AMDGCN_WAVEFRONT_SIZE_IS_64_ON_THE_HOST__ as an unchecked and more clearly named alternative, and
  • probably change the wording of the deprecation warning to also point to this new macro as an alternative to silence the warning.

We would then recommend to the users to use __AMDGCN_WAVEFRONT_SIZE_IS_64_ON_THE_HOST__ instead of __AMDGCN_WAVEFRONT_SIZE__ outside of function bodies and global initializers (and to make sure that that is the behavior they want).

My main concern with this is that I'm not sure if users are likely to manually turn on these extra checks, so that the less strict but always-on diagnostics produced by this PR might have a bigger impact in practice, at least until we change the default.
Also: With the host/device target selection outside of function bodies during the different compilation modes as it is, users would need quite deep knowledge about compiler internals to make sure that __AMDGCN_WAVEFRONT_SIZE_IS_64_ON_THE_HOST__ is what they want (but, then, they also need that when using __AMDGCN_WAVEFRONT_SIZE__ in these places right now, so it might be okay to require that).

@AlexVlx
Copy link
Contributor

AlexVlx commented Oct 2, 2024

I don't think we should rely on these on the host at all, the addition was a design mistake initially, we probably should not double down on it. The wave size is an intrinsic property of the target, the host doesn't really have that property; there are canonical ways of querying the hardware's warp/wave size from CUDA/HIP, which folks happily use on the CUDA side since there's no __NVPTX_WARP_SIZE and there never has been one.

So, IMHO, we should not add even more macros / try to figure out rituals to make this work. Warn it's deprecated and broken-ish, get people off of it, forget we made a mistake here seems preferable. I don't think #83558 made a very strong case for keeping this around / trying to figure out creative ways to make it not not work, beyond "we have technical debt with it" (we do, but such things happen, and sometimes it does need clearing up, even if it's non-trivial). warpSize is not constexpr / consteval in CUDA, so that is another arbitrary point of variance we should consider removing (which would remove the main raison d'etre for the macro).

@Artem-B
Copy link
Member

Artem-B commented Oct 2, 2024

I don't think we should rely on these on the host at all, the addition was a design mistake initially, we probably should not double down on it.

I agree with it in principle. However, removing things that already exist should be done with consideration for the existing users. This particular macro may happen to be working well enough for some users and we may want to consider whether we can transition from here to the point where these macros no longer exist on the host, without causing much disruption, or, at least, giving the affected users enough of heads-up time to deal with it.

That said, I do not have enough context on how widely those macros are used in the actual HIP code. I can see scenarios where just disabling the macro on the host side may be a sensible choice, but, if the macro was publicly documented, then it should probably go through the standard deprecation process -- document deprecation intent/add warnings in the next release, remove it in the release after that. It's mostly a logistical problem for HIP owners, not technical one.

@Artem-B
Copy link
Member

Artem-B commented Oct 2, 2024

@ritter-x2a That's an outline of a strawman plan in case one does nave nontrivial amount of existing code that depends on this macro, and assuming that we still want to have a host-side macro for the wavefront size. If the end goal is not to have the host-side macro at all, then we don't need to complicate things that much. Just start deprecate, and remove it on the host.

@AlexVlx
Copy link
Contributor

AlexVlx commented Oct 2, 2024

I don't think we should rely on these on the host at all, the addition was a design mistake initially, we probably should not double down on it.

I agree with it in principle. However, removing things that already exist should be done with consideration for the existing users. This particular macro may happen to be working well enough for some users and we may want to consider whether we can transition from here to the point where these macros no longer exist on the host, without causing much disruption, or, at least, giving the affected users enough of heads-up time to deal with it.

Sure, I'm not arguing that we should just delete these outright and have folks deal with the aftermath. I'm just biased towards deprecation + actual removal because I think this existing is just going to become increasingly problematic as we go forward.

@yxsamliu
Copy link
Collaborator

yxsamliu commented Oct 2, 2024

One drawback of not defining __AMDGCN_WAVEFRONT_SIZE__ in host compilation is the impairment of uniformity of source code across host and device sides. Users have to put #if __HIP_DEVICE_COMPILE__ anywhere they use __AMDGCN_WAVEFRONT_SIZE__. Previous experience tells us that this causes bad user experience and clutters source code.

There are valid uses of __AMDGCN_WAVEFRONT_SIZE__ seen in host compilation. Obviously they can be safely used in device functions or kernels. Even if they are used in file scope, as long as the resulted constants end up in device binary, they are safe to be 'seen' during host compilation. However, if we simply ban __AMDGCN_WAVEFRONT_SIZE__ in host compilation, we are banning the valid uses of __AMDGCN_WAVEFRONT_SIZE__ in the source code.

I think we need to balance between usability and safety. That is, we want to diagnose misuse of __AMDGCN_WAVEFRONT_SIZE__ in host code without impairing too much on single-source-ness of HIP.

@AlexVlx
Copy link
Contributor

AlexVlx commented Oct 2, 2024

Just because somebody wrote something we should not assume it is necessarily sound or valid. Is there some illustration of said valid uses / have the users made an argument around why they absolutely must have the macro?

CUDA doesn't have a macro like this at all AFAIK, and everything was fine dynamically querying warpSize on the host and relying on non-constexpr warpSize on the device side. Not only does it not impair the single-source-ness of CUDA, but I'd say it actually enhances it. They are solving the same problem, can we identify why their solution is inadequate and we should not migrate people towards that, removing an arbitrary point of divergence between HIP & CUDA?

@Artem-B
Copy link
Member

Artem-B commented Oct 2, 2024

Unless HIP explicitly defines wavefront size property for the host (I do not think so), it would appear that it's a property of a GPU, and as such should not be treated as a constant on the host, because the host needs to deal with multiple GPU variants, with different idea of the wavefront size.

I'm not surprised that there are users who may be (mis)using the macro now, and who are relying on "happens to work", this is something that probably should not have existed. My vote is for deprecating it.

Possibly simpler way to deal with it would be to not define it in clang for the host compilation, and instead add it to the pre-included headers, defining it with an escape hatch which will need user intervention (and thus they will be aware that it needs to be fixed).

E.g.

#if !defined(__HIP_DEVICE_COMPILE__)
#if defined(I_REALLY_WANT_TO_USE_CONSTANT_WAVEFRONT_SIZE_ON_THE_HOST)
#define __AMDGCN_WAVEFRONT_SIZE__ 64
#endif
#endif

Users have to put #if HIP_DEVICE_COMPILE anywhere they use AMDGCN_WAVEFRONT_SIZE. Previous experience tells us that this causes bad user experience and clutters source code.

I think it's unavoidable. Users who just use the macro now either rely on "happens to work" or just didn't notice the problem they may already have.

Whatever depends on the wavefront size on the host end will have to be done conditionally, either at compile time via __HIP_DEVICE_COMPILE__ or by checking the properties of the specific GPU the code is dealing with.

@ritter-x2a
Copy link
Member Author

Closing this PR in favor of a more comprehensive treatment of the AMDGCN_WAVEFRONT_SIZE situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:driver 'clang' and 'clang++' user-facing binaries. Not 'clang-cl' clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants