-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[Clang][HIP] Warn when __AMDGCN_WAVEFRONT_SIZE is used in host code without relying on target-dependent overload resolution #109663
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…ithout relying on target-dependent overload resolution The __AMDGCN_WAVEFRONT_SIZE and __AMDGCN_WAVEFRONT_SIZE__ macros in HIP can only provide meaningful values during device compilation. They are currently usable in host code, but only contain the default value of 64, independent of the target device(s). This patch checks for numeric literals in clearly identifiable host code if they are the result of expanding the wavefront-size macros and issues a diagnostic if that's the case. A alternative PR, llvm#91478, relied on constexpr functions with host and device overloads (where the host overload is marked as deprecated) to diagnose uses of these macros in host code. A problem with this approach are uses of the macros outside of function bodies, e.g., in template arguments of return types, or default template arguments of functions. In these cases, calls to functions with target overloads are resolved to the host variant during host compilation and to the device variant during device compilation - independently of the target of the function they belong to. Therefore, using the wavefront size macros in such cases leads to diagnostics during host compilation with llvm#91478, even if they are only associated to a device function. PR llvm#93546 is a proposal to suppress these spurious diagnostics. PR llvm#103031 is a proposal to change the behavior of target-dependent overload resolution outside of function bodies to use the target attributes that occur before the overloaded call to select the overload candidate. In contrast to llvm#91478, this PR will not diagnose uses of the wavefront-size macros outside of function bodies or initializers of global host variables. Implements SWDEV-449015.
|
@llvm/pr-subscribers-clang-driver @llvm/pr-subscribers-clang Author: Fabian Ritter (ritter-x2a) ChangesThis is a proposal for an alternative to PR #91478 that would make PRs #93546 and #103031 unnecessary. Please let me know if this one is preferrable over PRs #91478 and #103031. The This patch checks for numeric literals in clearly identifiable host code if they are the result of expanding the wavefront-size macros and issues a diagnostic if that's the case. The alternative PR, #91478, relied on constexpr functions with host and device overloads (where the host overload is marked as deprecated) to diagnose uses of these macros in host code. A problem with this approach are uses of the macros outside of function bodies, e.g., in template arguments of return types, or default template arguments of functions. In these cases, calls to functions with target overloads are resolved to the host variant during host compilation and to the device variant during device compilation - independently of the target of the function they belong to. Therefore, using the wavefront size macros in such cases leads to diagnostics during host compilation with #91478, even if they are only associated to a device function. PR #93546 is a proposal to suppress these spurious diagnostics. PR #103031 is a proposal to change the behavior of target-dependent overload resolution outside of function bodies to use the target attributes that occur before the overloaded call to select the overload candidate. In contrast to #91478, this PR will not diagnose uses of the wavefront-size macros outside of function bodies or initializers of global host variables. Implements SWDEV-449015. Full diff: https://github.com/llvm/llvm-project/pull/109663.diff 5 Files Affected:
diff --git a/clang/include/clang/Basic/DiagnosticSemaKinds.td b/clang/include/clang/Basic/DiagnosticSemaKinds.td
index e4e04bff8b5120..557d2803021f60 100644
--- a/clang/include/clang/Basic/DiagnosticSemaKinds.td
+++ b/clang/include/clang/Basic/DiagnosticSemaKinds.td
@@ -9109,6 +9109,8 @@ def warn_offload_incompatible_redeclare : Warning<
"new declaration is %select{__device__|__global__|__host__|__host__ __device__}0 function, "
"old declaration is %select{__device__|__global__|__host__|__host__ __device__}1 function">,
InGroup<DiagGroup<"nvcc-compat">>, DefaultIgnore;
+def warn_ref_device_macro_on_host : Warning<
+ "device-specific macro %0 is not available in a %select{__device__|__global__|__host__|__host__ __device__}1 context">, InGroup<DiagGroup<"hip-wavefrontsize">>;
def err_cuda_device_builtin_surftex_cls_template : Error<
"illegal device builtin %select{surface|texture}0 reference "
diff --git a/clang/include/clang/Sema/SemaCUDA.h b/clang/include/clang/Sema/SemaCUDA.h
index 71f05e88fb539c..80b8dc24664b68 100644
--- a/clang/include/clang/Sema/SemaCUDA.h
+++ b/clang/include/clang/Sema/SemaCUDA.h
@@ -263,6 +263,10 @@ class SemaCUDA : public SemaBase {
// for __constant__ and __device__ variables.
void checkAllowedInitializer(VarDecl *VD);
+ /// Check if the token is part of a macro that is used outside of its allowed
+ /// compilation mode.
+ void checkTargetMacroUse(const Token &Tok);
+
/// Check whether NewFD is a valid overload for CUDA. Emits
/// diagnostics and invalidates NewFD if not.
void checkTargetOverload(FunctionDecl *NewFD, const LookupResult &Previous);
diff --git a/clang/lib/Sema/SemaCUDA.cpp b/clang/lib/Sema/SemaCUDA.cpp
index fbb3de4b3e4165..b09319bbd894d4 100644
--- a/clang/lib/Sema/SemaCUDA.cpp
+++ b/clang/lib/Sema/SemaCUDA.cpp
@@ -703,6 +703,45 @@ void SemaCUDA::checkAllowedInitializer(VarDecl *VD) {
}
}
+void SemaCUDA::checkTargetMacroUse(const Token &Tok) {
+ assert(SemaRef.LangOpts.HIP);
+
+ // Currently, we check only for the AMDGCN_WAVEFRONT_SIZE macros, which should
+ // only be used in device compilation.
+ if (SemaRef.LangOpts.CUDAIsDevice)
+ return;
+
+ auto *FD = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);
+ // If we are not in a FunctionDecl and we have no other meaningful way of
+ // determining the compilation mode, avoid potentially spurious warnings.
+ if (!FD && SemaRef.CUDA().CurCUDATargetCtx.Kind == SemaCUDA::CTCK_Unknown)
+ return;
+
+ auto Target = SemaRef.CUDA().IdentifyTarget(FD);
+ if (Target != CUDAFunctionTarget::HostDevice &&
+ Target != CUDAFunctionTarget::Host)
+ return;
+
+ const auto &Loc = Tok.getLocation();
+ if (!Loc.isMacroID())
+ return;
+
+ // Get the location of the innermost macro that contributed the token.
+ const auto &SM = SemaRef.getSourceManager();
+ const auto &IMCLoc = SM.getImmediateMacroCallerLoc(Loc);
+ const auto &SpellingLoc = SM.getSpellingLoc(IMCLoc);
+
+ SmallString<16> buffer;
+ auto MacroName = SemaRef.getPreprocessor().getSpelling(SpellingLoc, buffer);
+ if (MacroName == "__AMDGCN_WAVEFRONT_SIZE" ||
+ MacroName == "__AMDGCN_WAVEFRONT_SIZE__") {
+ // Only report the actual use of the macro, not its builtin definition.
+ auto UseLoc = SM.getExpansionLoc(Tok.getLocation());
+ SemaRef.Diag(UseLoc, diag::warn_ref_device_macro_on_host)
+ << MacroName << llvm::to_underlying(SemaRef.CUDA().CurrentTarget());
+ }
+}
+
void SemaCUDA::RecordImplicitHostDeviceFuncUsedByDevice(
const FunctionDecl *Callee) {
FunctionDecl *Caller = SemaRef.getCurFunctionDecl(/*AllowLambda=*/true);
diff --git a/clang/lib/Sema/SemaExpr.cpp b/clang/lib/Sema/SemaExpr.cpp
index 66df9c969256a2..4c7178fb8f5205 100644
--- a/clang/lib/Sema/SemaExpr.cpp
+++ b/clang/lib/Sema/SemaExpr.cpp
@@ -4079,6 +4079,9 @@ ExprResult Sema::ActOnNumericConstant(const Token &Tok, Scope *UDLScope) {
ResultVal = ResultVal.trunc(Width);
}
Res = IntegerLiteral::Create(Context, ResultVal, Ty, Tok.getLocation());
+
+ if (SemaRef.LangOpts.HIP)
+ SemaRef.CUDA().checkTargetMacroUse(Tok);
}
// If this is an imaginary literal, create the ImaginaryLiteral wrapper.
diff --git a/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip b/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip
new file mode 100644
index 00000000000000..3bde9730ccb0d6
--- /dev/null
+++ b/clang/test/Driver/hip-wavefront-size-host-diagnostics.hip
@@ -0,0 +1,109 @@
+// REQUIRES: amdgpu-registered-target
+// RUN: %clang -xhip --offload-arch=gfx1030 --offload-host-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify=onhost %s
+// RUN: %clang -xhip --offload-arch=gfx1030 --offload-device-only -pedantic -nogpuinc -nogpulib -nobuiltininc -fsyntax-only -Xclang -verify=ondevice %s
+
+// ondevice-no-diagnostics
+
+#include <type_traits>
+
+#define WRAPPED __AMDGCN_WAVEFRONT_SIZE__
+
+#define DOUBLE_WRAPPED (WRAPPED)
+
+__attribute__((host, device)) void use(int, const char*);
+
+template<int N> __attribute__((host, device)) int templatify(int x) {
+ return x + N;
+}
+
+// no warning expected
+#if defined(__HIP_DEVICE_COMPILE__) && (__AMDGCN_WAVEFRONT_SIZE__ == 64) && (__AMDGCN_WAVEFRONT_SIZE == 64)
+int foo(void);
+#endif
+
+// no warning expected
+__attribute__((device)) int device_var = __AMDGCN_WAVEFRONT_SIZE__;
+
+__attribute__((device))
+void device_fun() {
+ // no warnings expected
+ use(__AMDGCN_WAVEFRONT_SIZE, "device function");
+ use(__AMDGCN_WAVEFRONT_SIZE__, "device function");
+ use(WRAPPED, "device function");
+ use(DOUBLE_WRAPPED, "device function");
+ use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "device function");
+}
+
+__attribute__((global))
+void global_fun() {
+ // no warnings expected
+ use(__AMDGCN_WAVEFRONT_SIZE, "global function");
+ use(__AMDGCN_WAVEFRONT_SIZE__, "global function");
+ use(WRAPPED, "global function");
+ use(DOUBLE_WRAPPED, "global function");
+ use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "global function");
+}
+
+// warning expected
+int host_var = __AMDGCN_WAVEFRONT_SIZE__; // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+int host_var_alt = __AMDGCN_WAVEFRONT_SIZE; // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE is not available in a __host__ context}}
+int host_var_wrapped = WRAPPED; // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+int host_var_double_wrapped = DOUBLE_WRAPPED; // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+
+__attribute__((host))
+void host_fun() {
+ // warnings expected
+ use(__AMDGCN_WAVEFRONT_SIZE, "host function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE is not available in a __host__ context}}
+ use(__AMDGCN_WAVEFRONT_SIZE__, "host function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+ use(WRAPPED, "host function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+ use(DOUBLE_WRAPPED, "host function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+ use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}}
+}
+
+__attribute((host, device))
+void host_device_fun() {
+ // warnings expected
+ use(__AMDGCN_WAVEFRONT_SIZE__, "host device function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+ use(WRAPPED, "host device function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+ use(DOUBLE_WRAPPED, "host device function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+ use(templatify<__AMDGCN_WAVEFRONT_SIZE__>(42), "host device function"); // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ __device__ context}}
+}
+
+// Variations of this construct are used in rocPRIM and should compile without diagnostics.
+template <unsigned int OuterWarpSize = __AMDGCN_WAVEFRONT_SIZE>
+class FunSelector {
+public:
+ template<unsigned int FunWarpSize = OuterWarpSize>
+ __attribute__((device))
+ auto fun(void)
+ -> typename std::enable_if<(FunWarpSize <= __AMDGCN_WAVEFRONT_SIZE), void>::type
+ {
+ use(1, "yay!");
+ }
+
+ template<unsigned int FunWarpSize = OuterWarpSize>
+ __attribute__((device))
+ auto fun(void)
+ -> typename std::enable_if<(FunWarpSize > __AMDGCN_WAVEFRONT_SIZE), void>::type
+ {
+ use(0, "nay!");
+ }
+};
+
+__attribute__((device))
+void device_fun_selector_user() {
+ FunSelector<> f;
+ f.fun<>();
+ f.fun<1>();
+ f.fun<1000>();
+
+ std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE), int>::type x = 42;
+}
+
+__attribute__((device)) std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE), int>::type DeviceFunTemplateRet(void) {
+ return 42;
+}
+
+__attribute__((device)) int DeviceFunTemplateArg(std::enable_if<(1 <= __AMDGCN_WAVEFRONT_SIZE), int>::type x) {
+ return x;
+}
|
|
Ping. |
| } | ||
|
|
||
| // warning expected | ||
| int host_var = __AMDGCN_WAVEFRONT_SIZE__; // onhost-warning {{device-specific macro __AMDGCN_WAVEFRONT_SIZE__ is not available in a __host__ context}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will const or constexpr host variables dependent on the macros also produce warnings?
E.g. something like this https://godbolt.org/z/1bxnrxrnn may be OK:
const int z = __AMDGCN_WAVEFRONT_SIZE;
__global__ void kernel(int* array, int n) {
do_something_with(z);
}
The use of z on the host side would still be wrong, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With this patch, the initializer of z would cause a warning. Is that not the intended behavior?
The const variable will only have a meaningful value during device compilation and therefore should only be used in a device context, so it should be declared with __device__ to avoid the warning, right?
|
I'm curious why are those macros even defined on the host? It looks like these macros should be handled in a way similar to |
Those macros are defined on the host (with a default value of 64) to allow using them in preprocessor directives and template arguments outside of function bodies and variable initializers without parsing errors during host compilation. From what I can tell, the |
|
It is a constant when the triple is amdgcn and -target-cpu is specified. Otherwise it should not be treated as constant. I think you could refactor the code to introduce one more function Sema::isMacroTargetSpecificConstant, which returns whether a macro is constant based on triple and CPU. Then diagnose it in HIP like the current patch does. In this way, the check may be used by other languages. |
What's the ultimate goal here? If we're OK to warn on some obvious misuses, then it may do. However, there will be cases where we're unlikely to tell whether the use of the macro is a problem or not. One way to give users a sensibly safe way to get wavefront info may look roughly like this:
If we want to find problems in the user's code, flip the knob to use the checked macro variant, compile user code, and find potentially problematic places. Eventually flip the flag to be on by default, which would prevent future misuse of the |
|
@Artem-B thank you for the suggestion!
We would then recommend to the users to use My main concern with this is that I'm not sure if users are likely to manually turn on these extra checks, so that the less strict but always-on diagnostics produced by this PR might have a bigger impact in practice, at least until we change the default. |
|
I don't think we should rely on these on the host at all, the addition was a design mistake initially, we probably should not double down on it. The wave size is an intrinsic property of the target, the host doesn't really have that property; there are canonical ways of querying the hardware's warp/wave size from CUDA/HIP, which folks happily use on the CUDA side since there's no So, IMHO, we should not add even more macros / try to figure out rituals to make this work. Warn it's deprecated and broken-ish, get people off of it, forget we made a mistake here seems preferable. I don't think #83558 made a very strong case for keeping this around / trying to figure out creative ways to make it not not work, beyond "we have technical debt with it" (we do, but such things happen, and sometimes it does need clearing up, even if it's non-trivial). |
I agree with it in principle. However, removing things that already exist should be done with consideration for the existing users. This particular macro may happen to be working well enough for some users and we may want to consider whether we can transition from here to the point where these macros no longer exist on the host, without causing much disruption, or, at least, giving the affected users enough of heads-up time to deal with it. That said, I do not have enough context on how widely those macros are used in the actual HIP code. I can see scenarios where just disabling the macro on the host side may be a sensible choice, but, if the macro was publicly documented, then it should probably go through the standard deprecation process -- document deprecation intent/add warnings in the next release, remove it in the release after that. It's mostly a logistical problem for HIP owners, not technical one. |
|
@ritter-x2a That's an outline of a strawman plan in case one does nave nontrivial amount of existing code that depends on this macro, and assuming that we still want to have a host-side macro for the wavefront size. If the end goal is not to have the host-side macro at all, then we don't need to complicate things that much. Just start deprecate, and remove it on the host. |
Sure, I'm not arguing that we should just delete these outright and have folks deal with the aftermath. I'm just biased towards deprecation + actual removal because I think this existing is just going to become increasingly problematic as we go forward. |
|
One drawback of not defining There are valid uses of I think we need to balance between usability and safety. That is, we want to diagnose misuse of |
|
Just because somebody wrote something we should not assume it is necessarily sound or valid. Is there some illustration of said valid uses / have the users made an argument around why they absolutely must have the macro? CUDA doesn't have a macro like this at all AFAIK, and everything was fine dynamically querying |
|
Unless HIP explicitly defines wavefront size property for the host (I do not think so), it would appear that it's a property of a GPU, and as such should not be treated as a constant on the host, because the host needs to deal with multiple GPU variants, with different idea of the wavefront size. I'm not surprised that there are users who may be (mis)using the macro now, and who are relying on "happens to work", this is something that probably should not have existed. My vote is for deprecating it. Possibly simpler way to deal with it would be to not define it in clang for the host compilation, and instead add it to the pre-included headers, defining it with an escape hatch which will need user intervention (and thus they will be aware that it needs to be fixed). E.g.
I think it's unavoidable. Users who just use the macro now either rely on "happens to work" or just didn't notice the problem they may already have. Whatever depends on the wavefront size on the host end will have to be done conditionally, either at compile time via |
|
Closing this PR in favor of a more comprehensive treatment of the AMDGCN_WAVEFRONT_SIZE situation. |
This is a proposal for an alternative to PR #91478 that would make PRs #93546 and #103031 unnecessary.
The
__AMDGCN_WAVEFRONT_SIZEand__AMDGCN_WAVEFRONT_SIZE__macros in HIP can only provide meaningful values during device compilation. They are currently usable in host code, but only contain the default value of 64, independent of the target device(s).This patch checks for numeric literals in clearly identifiable host code if they are the result of expanding the wavefront-size macros and issues a diagnostic if that's the case.
The alternative PR, #91478, relied on constexpr functions with host and device overloads (where the host overload is marked as deprecated) to diagnose uses of these macros in host code. A problem with this approach are uses of the macros outside of function bodies, e.g., in template arguments of return types, or default template arguments of functions. In these cases, calls to functions with target overloads are resolved to the host variant during host compilation and to the device variant during device compilation - independently of the target of the function they belong to. Therefore, using the wavefront size macros in such cases leads to diagnostics during host compilation with #91478, even if they are only associated to a device function.
PR #93546 is a proposal to suppress these spurious diagnostics. PR #103031 is an - as of yet insufficient - proposal to change the behavior of target-dependent overload resolution outside of function bodies to use the target attributes that occur before the overloaded call to select the overload candidate.
In contrast to #91478, this PR will not diagnose uses of the wavefront-size macros outside of function bodies or initializers of global host variables.
Implements SWDEV-449015.