[clang] Fix inconsistencies with the device_kernel attr on different targets #161905

sarnex · 2025-10-03T21:00:01Z

The original change unifying the device kernel attributes had some inexplicable behavior, such as amdgpu_kernel resulting in a function ending up with the spir_kernel CC but nvptx_kernel not doing the same, both cases compiling for SPIR. There was also a crash. sycl_kernel is now separated out from device_kernel, but still there was some weird behavior for the remaining spellings.

For the target-specific spellings (nvptx_kernel and amdgpu_kernel), while not technically required, we warn and ignore the attribute if the spelling doesn't match the target because it's weird from the user's point of view to allow it.

Also we make sure that any valid usage actually applies the CC to the generated llvm:Function. This worked for NVPTX already but was missing for SPIR/SPIR-V and AMDGPU, it needs to be explicitly done in TargetInfo. This allows us to remove the amdgpu_kernel specific handing we had. That special handling was previously required because it was the only variation that was allowed on a type, and thus had a separate way to propagate the CC.

These issues were reported here and here.

Closes: #161077

…targets Signed-off-by: Sarnie, Nick <[email protected]>

github-actions · 2025-10-03T21:03:28Z

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:

git-clang-format --diff origin/main HEAD --extensions c,h,cpp -- clang/test/Sema/callingconv-devicekernel.cpp clang/lib/AST/TypePrinter.cpp clang/lib/Basic/Targets/NVPTX.h clang/lib/CodeGen/Targets/AMDGPU.cpp clang/lib/CodeGen/Targets/NVPTX.cpp clang/lib/CodeGen/Targets/SPIR.cpp clang/lib/Sema/SemaDeclAttr.cpp clang/lib/Sema/SemaType.cpp clang/test/Sema/callingconv.c --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.

diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 568d56ab0..4d3ad7264 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2146,7 +2146,9 @@ void TypePrinter::printAttributedAfter(const AttributedType *T,
    break;
   }
   case attr::AArch64VectorPcs: OS << "aarch64_vector_pcs"; break;
-  case attr::AArch64SVEPcs: OS << "aarch64_sve_pcs"; break;
+  case attr::AArch64SVEPcs:
+    OS << "aarch64_sve_pcs";
+    break;
   case attr::IntelOclBicc:
     OS << "inteloclbicc";
     break;

sarnex · 2025-10-03T21:04:53Z

clang/lib/Sema/SemaType.cpp

  case ParsedAttr::AT_Pascal:                                                  \
  case ParsedAttr::AT_SwiftCall:                                               \
  case ParsedAttr::AT_SwiftAsyncCall:                                          \
  case ParsedAttr::AT_VectorCall:                                              \


I think clang-format is wrong so I'm planning on ignoring it (see above GH comment from the bot, not this code).
Let me know you disagree and I'm happy to fix in in this PR or separately.

llvmbot · 2025-10-03T21:44:21Z

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Nick Sarnie (sarnex)

Changes

The original change unifying the device kernel attributes had some inexplicable behavior, such as amdgpu_kernel resulting in a function ending up with the spir_kernel CC but nvptx_kernel not doing the same, both cases compiling for SPIR. There was also a crash.

For the target-specific spellings (nvptx_kernel and amdgpu_kernel), while not technically required, we warn and ignore the attribute if the spelling doesn't match the target because it's weird from the user's point of view to allow it.

Also we make sure that any valid usage actually applies the CC to the generated llvm:Function. This worked for NVPTX already but was missing for SPIR/SPIR-V and AMDGPU, it needs to be explicitly done in TargetInfo. This allows us to remove the amdgpu_kernel specific handing we had. That special handling was previously required because it was the only variation that was allowed on a type, and thus had a separate way to propagate the CC.

These issues were reported here and here.

Closes: #161077

Full diff: https://github.com/llvm/llvm-project/pull/161905.diff

10 Files Affected:

(modified) clang/docs/ReleaseNotes.rst (+1)
(modified) clang/include/clang/Basic/Attr.td (+1-1)
(modified) clang/lib/AST/TypePrinter.cpp (-3)
(modified) clang/lib/Basic/Targets/NVPTX.h (+1-1)
(modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+4-2)
(modified) clang/lib/CodeGen/Targets/SPIR.cpp (+30-3)
(modified) clang/lib/Sema/SemaDeclAttr.cpp (+33-3)
(modified) clang/lib/Sema/SemaType.cpp (+2-16)
(added) clang/test/Sema/callingconv-devicekernel.c (+24)
(modified) clang/test/Sema/callingconv.c (+4)

diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index d2e5bd284d350..da391082eba5b 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -401,6 +401,7 @@ Bug Fixes to Attribute Support
 - Using ``[[gnu::cleanup(some_func)]]`` where some_func is annotated with
   ``[[gnu::error("some error")]]`` now correctly triggers an error. (#GH146520)
 - Fix a crash when the function name is empty in the `swift_name` attribute. (#GH157075)
+- Fixes crashes or missing diagnostics with the `device_kernel` attribute. (#GH161905)
 
 Bug Fixes to C++ Support
 ^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 3c697ed8dd882..6ec0eac529245 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1599,7 +1599,7 @@ def CUDAShared : InheritableAttr {
 }
 def : MutualExclusions<[CUDAConstant, CUDAShared, HIPManaged]>;
 
-def DeviceKernel : DeclOrTypeAttr {
+def DeviceKernel : InheritableAttr {
   let Spellings = [Clang<"device_kernel">, Clang<"sycl_kernel">,
                    Clang<"nvptx_kernel">, Clang<"amdgpu_kernel">,
                    CustomKeyword<"__kernel">, CustomKeyword<"kernel">];
diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 66a1b684ec68b..568d56ab0b911 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2147,9 +2147,6 @@ void TypePrinter::printAttributedAfter(const AttributedType *T,
   }
   case attr::AArch64VectorPcs: OS << "aarch64_vector_pcs"; break;
   case attr::AArch64SVEPcs: OS << "aarch64_sve_pcs"; break;
-  case attr::DeviceKernel:
-    OS << T->getAttr()->getSpelling();
-    break;
   case attr::IntelOclBicc:
     OS << "inteloclbicc";
     break;
diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index 33c29586359be..f5c8396f398aa 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -200,7 +200,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo {
     // a host function.
     if (HostTarget)
       return HostTarget->checkCallingConvention(CC);
-    return CCCR_Warning;
+    return CC == CC_DeviceKernel ? CCCR_OK : CCCR_Warning;
   }
 
   bool hasBitIntType() const override { return true; }
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0fcbf7e458a34..d54b1dc128254 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -419,9 +419,11 @@ void AMDGPUTargetCodeGenInfo::setTargetAttributes(
     return;
 
   const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);
-  if (FD)
+  if (FD) {
     setFunctionDeclAttributes(FD, F, M);
-
+    if (FD->hasAttr<DeviceKernelAttr>() && !M.getLangOpts().OpenCL)
+      F->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
+  }
   if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)
     F->addFnAttr("amdgpu-ieee", "false");
 }
diff --git a/clang/lib/CodeGen/Targets/SPIR.cpp b/clang/lib/CodeGen/Targets/SPIR.cpp
index 4aa63143a66cd..42ef1c704831c 100644
--- a/clang/lib/CodeGen/Targets/SPIR.cpp
+++ b/clang/lib/CodeGen/Targets/SPIR.cpp
@@ -61,6 +61,8 @@ class CommonSPIRTargetCodeGenInfo : public TargetCodeGenInfo {
       QualType SampledType, CodeGenModule &CGM) const;
   void
   setOCLKernelStubCallingConvention(const FunctionType *&FT) const override;
+  void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
+                           CodeGen::CodeGenModule &M) const override;
 };
 class SPIRVTargetCodeGenInfo : public CommonSPIRTargetCodeGenInfo {
 public:
@@ -240,6 +242,26 @@ void CommonSPIRTargetCodeGenInfo::setOCLKernelStubCallingConvention(
       FT, FT->getExtInfo().withCallingConv(CC_SpirFunction));
 }
 
+void CommonSPIRTargetCodeGenInfo::setTargetAttributes(
+    const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
+  if (M.getLangOpts().OpenCL)
+    return;
+
+  if (GV->isDeclaration())
+    return;
+
+  llvm::Function *F = dyn_cast<llvm::Function>(GV);
+  if (!F)
+    return;
+
+  const FunctionDecl *FD = dyn_cast<FunctionDecl>(D);
+  if (!FD)
+    return;
+
+  if (FD->hasAttr<DeviceKernelAttr>())
+    F->setCallingConv(getDeviceKernelCallingConv());
+}
+
 LangAS
 SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace(CodeGenModule &CGM,
                                                  const VarDecl *D) const {
@@ -264,9 +286,6 @@ SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace(CodeGenModule &CGM,
 
 void SPIRVTargetCodeGenInfo::setTargetAttributes(
     const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
-  if (!M.getLangOpts().HIP ||
-      M.getTarget().getTriple().getVendor() != llvm::Triple::AMD)
-    return;
   if (GV->isDeclaration())
     return;
 
@@ -277,6 +296,14 @@ void SPIRVTargetCodeGenInfo::setTargetAttributes(
   auto FD = dyn_cast_or_null<FunctionDecl>(D);
   if (!FD)
     return;
+
+  if (FD->hasAttr<DeviceKernelAttr>())
+    F->setCallingConv(llvm::CallingConv::SPIR_KERNEL);
+
+  if (!M.getLangOpts().HIP ||
+      M.getTarget().getTriple().getVendor() != llvm::Triple::AMD)
+    return;
+
   if (!FD->hasAttr<CUDAGlobalAttr>())
     return;
 
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 328ccf6694073..551d00b3c7476 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -5204,25 +5204,55 @@ static void handleCallConvAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
 static void handleDeviceKernelAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   const auto *FD = dyn_cast_or_null<FunctionDecl>(D);
   bool IsFunctionTemplate = FD && FD->getDescribedFunctionTemplate();
-  if (S.getLangOpts().SYCLIsDevice) {
+  llvm::Triple Triple = S.getASTContext().getTargetInfo().getTriple();
+  const LangOptions &LangOpts = S.getLangOpts();
+
+  if (LangOpts.SYCLIsDevice) {
     if (!IsFunctionTemplate) {
       S.Diag(AL.getLoc(), diag::warn_attribute_wrong_decl_type_str)
           << AL << AL.isRegularKeywordAttribute() << "function templates";
+      AL.setInvalid();
+      return;
     } else {
       S.SYCL().handleKernelAttr(D, AL);
     }
   } else if (DeviceKernelAttr::isSYCLSpelling(AL)) {
     S.Diag(AL.getLoc(), diag::warn_attribute_ignored) << AL;
-  } else if (S.getASTContext().getTargetInfo().getTriple().isNVPTX()) {
+    AL.setInvalid();
+
+    return;
+  } else if (Triple.isNVPTX()) {
     handleGlobalAttr(S, D, AL);
   } else {
     // OpenCL C++ will throw a more specific error.
-    if (!S.getLangOpts().OpenCLCPlusPlus && (!FD || IsFunctionTemplate)) {
+    if (!LangOpts.OpenCLCPlusPlus && (!FD || IsFunctionTemplate)) {
       S.Diag(AL.getLoc(), diag::err_attribute_wrong_decl_type_str)
           << AL << AL.isRegularKeywordAttribute() << "functions";
+      AL.setInvalid();
+      return;
     }
     handleSimpleAttribute<DeviceKernelAttr>(S, D, AL);
   }
+  // TODO: isGPU() should probably return true for SPIR.
+  bool TargetDeviceEnvironment = Triple.isGPU() || Triple.isSPIR() ||
+                                 LangOpts.isTargetDevice() || LangOpts.OpenCL;
+  bool IsAMDGPUMismatch =
+      DeviceKernelAttr::isAMDGPUSpelling(AL) && !Triple.isAMDGPU();
+  bool IsNVPTXMismatch =
+      DeviceKernelAttr::isNVPTXSpelling(AL) && !Triple.isNVPTX();
+  if (IsAMDGPUMismatch || IsNVPTXMismatch || !TargetDeviceEnvironment) {
+    // While both are just different spellings of the same underlying
+    // attribute, it makes more sense to the user if amdgpu_kernel can only
+    // be used on AMDGPU and the equivalent for NVPTX, so warn and ignore
+    // the attribute if there's a mismatch.
+    // Also warn if this is not an environment where a device kernel makes
+    // sense.
+    S.Diag(AL.getLoc(), diag::warn_cconv_unsupported)
+        << AL << (int)Sema::CallingConventionIgnoredReason::ForThisTarget;
+    AL.setInvalid();
+    return;
+  }
+
   // Make sure we validate the CC with the target
   // and warn/error if necessary.
   handleCallConvAttr(S, D, AL);
diff --git a/clang/lib/Sema/SemaType.cpp b/clang/lib/Sema/SemaType.cpp
index bee613aa5f1c5..0d5b0e7e842b3 100644
--- a/clang/lib/Sema/SemaType.cpp
+++ b/clang/lib/Sema/SemaType.cpp
@@ -134,7 +134,6 @@ static void diagnoseBadTypeAttribute(Sema &S, const ParsedAttr &attr,
   case ParsedAttr::AT_VectorCall:                                              \
   case ParsedAttr::AT_AArch64VectorPcs:                                        \
   case ParsedAttr::AT_AArch64SVEPcs:                                           \
-  case ParsedAttr::AT_DeviceKernel:                                            \
   case ParsedAttr::AT_MSABI:                                                   \
   case ParsedAttr::AT_SysVABI:                                                 \
   case ParsedAttr::AT_Pcs:                                                     \
@@ -3781,7 +3780,8 @@ static CallingConv getCCForDeclaratorChunk(
     }
   }
   if (!S.getLangOpts().isSYCL()) {
-    for (const ParsedAttr &AL : D.getDeclSpec().getAttributes()) {
+    for (const ParsedAttr &AL : llvm::concat<ParsedAttr>(
+             D.getDeclSpec().getAttributes(), D.getAttributes())) {
       if (AL.getKind() == ParsedAttr::AT_DeviceKernel) {
         CC = CC_DeviceKernel;
         break;
@@ -7565,8 +7565,6 @@ static Attr *getCCTypeAttr(ASTContext &Ctx, ParsedAttr &Attr) {
     return createSimpleAttr<AArch64SVEPcsAttr>(Ctx, Attr);
   case ParsedAttr::AT_ArmStreaming:
     return createSimpleAttr<ArmStreamingAttr>(Ctx, Attr);
-  case ParsedAttr::AT_DeviceKernel:
-    return createSimpleAttr<DeviceKernelAttr>(Ctx, Attr);
   case ParsedAttr::AT_Pcs: {
     // The attribute may have had a fixit applied where we treated an
     // identifier as a string literal.  The contents of the string are valid,
@@ -8805,16 +8803,6 @@ static void HandleHLSLParamModifierAttr(TypeProcessingState &State,
   }
 }
 
-static bool isMultiSubjectAttrAllowedOnType(const ParsedAttr &Attr) {
-  // The DeviceKernel attribute is shared for many targets, and
-  // it is only allowed to be a type attribute with the AMDGPU
-  // spelling, so skip processing the attr as a type attr
-  // unless it has that spelling.
-  if (Attr.getKind() != ParsedAttr::AT_DeviceKernel)
-    return true;
-  return DeviceKernelAttr::isAMDGPUSpelling(Attr);
-}
-
 static void processTypeAttrs(TypeProcessingState &state, QualType &type,
                              TypeAttrLocation TAL,
                              const ParsedAttributesView &attrs,
@@ -9068,8 +9056,6 @@ static void processTypeAttrs(TypeProcessingState &state, QualType &type,
         break;
       [[fallthrough]];
     FUNCTION_TYPE_ATTRS_CASELIST:
-      if (!isMultiSubjectAttrAllowedOnType(attr))
-        break;
 
       attr.setUsedAsTypeAttr();
 
diff --git a/clang/test/Sema/callingconv-devicekernel.c b/clang/test/Sema/callingconv-devicekernel.c
new file mode 100644
index 0000000000000..869687f8ca65d
--- /dev/null
+++ b/clang/test/Sema/callingconv-devicekernel.c
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s 2>&1 -o -| FileCheck -check-prefix=CHECK-AMDGPU %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda- -emit-llvm %s 2>&1 -o -| FileCheck -check-prefix=CHECK-NVPTX %s
+// RUN: %clang_cc1 -triple spir64 -emit-llvm %s 2>&1 -o - | FileCheck -check-prefix=CHECK-SPIR %s
+// RUN: %clang_cc1 -triple spirv64 -emit-llvm %s 2>&1 -o - | FileCheck -check-prefix=CHECK-SPIR %s
+
+// CHECK-AMDGPU-DAG: amdgpu_kernel void @kernel1()
+// CHECK-NVPTX-DAG: ptx_kernel void @kernel1()
+// CHECK-SPIR-DAG: spir_kernel void @kernel1()
+[[clang::device_kernel]] void kernel1() {}
+
+// CHECK-AMDGPU-DAG: amdgpu_kernel void @kernel2()
+// CHECK-NVPTX-DAG: 14:3: warning: 'clang::amdgpu_kernel' calling convention is not supported for this target
+// CHECK-SPIR-DAG: 14:3: warning: 'clang::amdgpu_kernel' calling convention is not supported for this target
+[[clang::amdgpu_kernel]] void kernel2() {}
+
+// CHECK-AMDGPU-DAG: 19:3: warning: 'clang::nvptx_kernel' calling convention is not supported for this target
+// CHECK-NVPTX-DAG: ptx_kernel void @kernel3()
+// CHECK-SPIR-DAG: 19:3: warning: 'clang::nvptx_kernel' calling convention is not supported for this target
+[[clang::nvptx_kernel]] void kernel3() {}
+
+// CHECK-AMDGPU-DAG: 24:3: warning: 'clang::sycl_kernel' attribute ignored
+// CHECK-NVPTX-DAG: 24:3: warning: 'clang::sycl_kernel' attribute ignored
+// CHECK-SPIR-DAG: 24:3: warning: 'clang::sycl_kernel' attribute ignored
+[[clang::sycl_kernel]] void kernel4() {}
diff --git a/clang/test/Sema/callingconv.c b/clang/test/Sema/callingconv.c
index f0b8b80a32974..28342b56af39a 100644
--- a/clang/test/Sema/callingconv.c
+++ b/clang/test/Sema/callingconv.c
@@ -55,6 +55,10 @@ int __attribute__((aarch64_vector_pcs)) aavpcs(void); // expected-warning {{'aar
 int __attribute__((aarch64_sve_pcs)) aasvepcs(void);  // expected-warning {{'aarch64_sve_pcs' calling convention is not supported for this target}}
 
 int __attribute__((amdgpu_kernel)) amdgpu_kernel(void); // expected-warning {{'amdgpu_kernel' calling convention is not supported for this target}}
+int __attribute__((device_kernel)) device_kernel(void) { // expected-warning {{'device_kernel' calling convention is not supported for this target}}
+}
+int __attribute__((sycl_kernel)) sycl_kernel(void) { // expected-warning {{'sycl_kernel' attribute ignored}}
+}
 
 // PR6361
 void ctest3();

sarnex · 2025-10-03T21:45:41Z

FYI: @camc

clang/lib/CodeGen/Targets/AMDGPU.cpp

clang/lib/CodeGen/Targets/SPIR.cpp

jhuber6 · 2025-10-03T21:48:53Z

clang/lib/CodeGen/Targets/SPIR.cpp

    return;
+
+  if (FD->hasAttr<DeviceKernelAttr>())
+    F->setCallingConv(llvm::CallingConv::SPIR_KERNEL);


Isn't the attribute called sycl_kernel? I thought that was a little strange since the others use the architecture name, not the language.

Not sure if I understand the question but there is no SYCL-specific calling convention, ex1, ex2. There is another alias (aka spelling) of the clang attribute but we have no need to check which specific spelling it's using here.

jhuber6 · 2025-10-03T21:50:56Z

clang/lib/Sema/SemaDeclAttr.cpp

+  // TODO: isGPU() should probably return true for SPIR.
+  bool TargetDeviceEnvironment = Triple.isGPU() || Triple.isSPIR() ||
+                                 LangOpts.isTargetDevice() || LangOpts.OpenCL;
+  bool IsAMDGPUMismatch =
+      DeviceKernelAttr::isAMDGPUSpelling(AL) && !Triple.isAMDGPU();
+  bool IsNVPTXMismatch =
+      DeviceKernelAttr::isNVPTXSpelling(AL) && !Triple.isNVPTX();
+  if (IsAMDGPUMismatch || IsNVPTXMismatch || !TargetDeviceEnvironment) {


Honestly I'm wondering what we should do here. Realistically it would make sense for these 'named' attributes to just be legacy aliases to device_kernel. I don't see any real value in keeping separate names unless there are special semantics that I'm unaware of. Ever since PTX moved over, this attribute is more plainly "set the kernel calling convention on this function."

It would definitely be simpler to have them as aliases. I'm happy to implement whatever the consensus us, so others feel free to drop in.

I'd prefer to just make it an alias, I don't see any significant reason to keep it separate since this all implies the same thing of 'make this function callable by the external runtime'.

Sure will do

clang/lib/Sema/SemaDeclAttr.cpp

Signed-off-by: Sarnie, Nick <[email protected]>

tahonermann · 2025-10-07T16:15:56Z

I missed the previous PRs that lumped all of these attributes together. I think sycl_kernel should have been left separate as it isn't directly used to declare a device function. It instead has rather complicated semantics that result in a function that is used on the host as well as a synthesized device function (with a different name, different parameters, and a hard-coded calling convention). However, sycl_kernel is also destined to go away in favor of the sycl_kernel_entry_point attribute.

jhuber6 · 2025-10-07T16:18:01Z

I missed the previous PRs that lumped all of these attributes together. I think sycl_kernel should have been left separate as it isn't directly used to declare a device function. It instead has rather complicated semantics that result in a function that is used on the host as well as a synthesized device function (with a different name, different parameters, and a hard-coded calling convention). However, sycl_kernel is also destined to go away in favor of the sycl_kernel_entry_point attribute.

I'd say we could just rename it to spirv_kernel to avoid the confusion, but I'm honestly not sure we need the special names if we just accept device_kernel to mean "externally callable thing in a GPU context." Possibly we could even make this generic on all targets if we just make it imply non-hidden visibility, since that would make a function 'callable' in this context.

sarnex · 2025-10-08T19:02:50Z

@tahonermann Would renaming sycl_kernel to spirv_kernel/spir_kernel resolve your concern?

tahonermann · 2025-10-09T19:25:20Z

@sarnex, @jhuber6,

@tahonermann Would renaming sycl_kernel to spirv_kernel/spir_kernel resolve your concern?

No, that would be much worse. SYCL is not synonymous with SPIR-V. Functions decorated with the sycl_kernel attribute are used to generate kernels for PTX, AMDGCN, as well as SPIR-V. sycl_kernel doesn't make a function an actual kernel entry point.

Unfortunately, it is hard to effectively compare the semantics of sycl_kernel with nvptx_kernel, amdgpu_kernel, kernel, and __kernel because the documentation that purports to describe them only talks about SYCL. I think the changes made to that documentation should be reverted and appropriate documentation for those other attributes added.

jhuber6 · 2025-10-09T19:50:19Z

If the sycl_kernel version has all this special behavior it should be a separate attribute and put under the SYCL language mode. The others should all just be aliases to device_kernel and have those do some unified behavior, imply non-hidden visibility (if hidden by default upgrade to protected), and add the appropriate target calling convention (if one exists).

sarnex · 2025-10-09T19:52:59Z

Joseph's suggestion seems the best to me, having one of the aliases have totally different semantics is just asking for trouble. If @tahonermann agrees I'll update this PR to implement that.

tahonermann · 2025-10-09T22:30:09Z

Joseph's suggestion seems the best to me, having one of the aliases have totally different semantics is just asking for trouble. If @tahonermann agrees I'll update this PR to implement that.

Yes, please. Enablement of the sycl_kernel attribute is already subject to use of the -fsycl option. Please do revert the prior change to the documentation and add appropriate documentation for device_kernel and friends. I suggest doing this in a separate PR (probably one that lands before this one).

sarnex · 2025-10-10T14:33:17Z

Will do.

Based on feedback from #161905, partially revert #137882 so that `sycl_kernel` is a separate attribute and not just a spelling of `device_kernel`. Signed-off-by: Sarnie, Nick <[email protected]>

… (#162868) Based on feedback from llvm/llvm-project#161905, partially revert llvm/llvm-project#137882 so that `sycl_kernel` is a separate attribute and not just a spelling of `device_kernel`. Signed-off-by: Sarnie, Nick <[email protected]>

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex · 2025-10-13T16:55:32Z

Reworked this after separating sycl_kernel back out.

jhuber6 · 2025-10-13T16:58:50Z

clang/lib/Sema/SemaDeclAttr.cpp

+  // TODO: isGPU() should probably return true for SPIR.
+  bool TargetDeviceEnvironment = Triple.isGPU() || Triple.isSPIR() ||
+                                 LangOpts.isTargetDevice() || LangOpts.OpenCL;
+  bool IsAMDGPUMismatch =
+      DeviceKernelAttr::isAMDGPUSpelling(AL) && !Triple.isAMDGPU();
+  bool IsNVPTXMismatch =
+      DeviceKernelAttr::isNVPTXSpelling(AL) && !Triple.isNVPTX();
+  if (IsAMDGPUMismatch || IsNVPTXMismatch || !TargetDeviceEnvironment) {


I'd prefer to just make it an alias, I don't see any significant reason to keep it separate since this all implies the same thing of 'make this function callable by the external runtime'.

clang/test/Sema/callingconv-devicekernel.c

Signed-off-by: Sarnie, Nick <[email protected]>

Based on feedback from llvm#161905, partially revert llvm#137882 so that `sycl_kernel` is a separate attribute and not just a spelling of `device_kernel`. Signed-off-by: Sarnie, Nick <[email protected]>

sarnex · 2025-10-16T20:50:24Z

@tahonermann Are you okay with these changes given that SYCL is no longer touched here? Thanks

tahonermann · 2025-10-21T22:22:35Z

@tahonermann Are you okay with these changes given that SYCL is no longer touched here? Thanks

Yes, I am.

I'm still skeptical that the OpenCL kernel attribute should have been combined with the CUDA/HIP kernel attributes given all the special casing for it. I also don't love that there is now an additional attribute identifier that can be used to declare a kernel with no plan (as far as I know) to deprecate and/or remove the two (or four if you include OpenCL) existing spellings of that attribute. But those changes weren't part of this PR.

sarnex · 2025-10-21T22:27:32Z

Thanks for the feedback. I plan to merge this PR shortly.

…targets (llvm#161905) The original [change](llvm#137882) unifying the device kernel attributes had some inexplicable behavior, such as `amdgpu_kernel` resulting in a function ending up with the `spir_kernel` CC but `nvptx_kernel` not doing the same, both cases compiling for SPIR. There was also a [crash](llvm#161077). `sycl_kernel` is now separated out from `device_kernel`, but still there was some weird behavior for the remaining spellings. For the target-specific spellings (`nvptx_kernel` and `amdgpu_kernel`), while not technically required, we warn and ignore the attribute if the spelling doesn't match the target because it's weird from the user's point of view to allow it. Also we make sure that any valid usage actually applies the CC to the generated `llvm:Function`. This worked for `NVPTX` already but was missing for `SPIR/SPIR-V` and `AMDGPU`, it needs to be explicitly done in `TargetInfo`. This allows us to remove the `amdgpu_kernel` specific handing we had. That special handling was previously required because it was the only variation that was allowed on a type, and thus had a separate way to propagate the CC. These issues were reported [here](llvm#161077) and [here](llvm#161349). Closes: llvm#161077 --------- Signed-off-by: Sarnie, Nick <[email protected]>

[clang] Fix inconsistencies with the device_kernel attr on different …

a27574f

…targets Signed-off-by: Sarnie, Nick <[email protected]>

sarnex force-pushed the fixattr branch from cff8926 to a27574f Compare October 3, 2025 21:00

sarnex commented Oct 3, 2025

View reviewed changes

sarnex marked this pull request as ready for review October 3, 2025 21:43

llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 3, 2025

sarnex requested review from AaronBallman, Fznamznon, erichkeane, jhuber6 and k-arrows October 3, 2025 21:45

jhuber6 reviewed Oct 3, 2025

View reviewed changes

arsenm reviewed Oct 4, 2025

View reviewed changes

clang/lib/Sema/SemaDeclAttr.cpp Outdated Show resolved Hide resolved

minor cleanup and address code review comments

d796500

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex marked this pull request as draft October 10, 2025 14:33

This was referenced Oct 10, 2025

[clang][Sema] Split SYCLKernel back into its own attribute #162868

Merged

[LLVM] Port 'llvm-gpu-loader' to use LLVMOffload #162739

Open

sarnex added 2 commits October 13, 2025 08:53

Merge remote-tracking branch 'upstream/main'

48a98e2

format

68cf3dd

Signed-off-by: Sarnie, Nick <[email protected]>

sarnex marked this pull request as ready for review October 13, 2025 16:55

jhuber6 reviewed Oct 13, 2025

View reviewed changes

add error

948aabd

Signed-off-by: Sarnie, Nick <[email protected]>

jhuber6 approved these changes Oct 13, 2025

View reviewed changes

sarnex mentioned this pull request Oct 15, 2025

fix merge after readding syclkernel attr intel/llvm#20363

Merged

sarnex merged commit f6ba213 into llvm:main Oct 22, 2025
10 of 11 checks passed

[clang] Fix inconsistencies with the device_kernel attr on different targets #161905

[clang] Fix inconsistencies with the device_kernel attr on different targets #161905

Uh oh!

Conversation

sarnex commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarnex Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

llvmbot commented Oct 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarnex commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jhuber6 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

sarnex Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 Oct 3, 2025

Choose a reason for hiding this comment

Uh oh!

sarnex Oct 6, 2025

Choose a reason for hiding this comment

Uh oh!

jhuber6 Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

sarnex Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tahonermann commented Oct 7, 2025

Uh oh!

jhuber6 commented Oct 7, 2025

Uh oh!

sarnex commented Oct 8, 2025

Uh oh!

tahonermann commented Oct 9, 2025

Uh oh!

jhuber6 commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sarnex commented Oct 9, 2025

Uh oh!

tahonermann commented Oct 9, 2025

Uh oh!

sarnex commented Oct 10, 2025

Uh oh!

sarnex commented Oct 13, 2025

Uh oh!

jhuber6 Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sarnex commented Oct 16, 2025

Uh oh!

tahonermann commented Oct 21, 2025

Uh oh!

sarnex commented Oct 21, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

sarnex commented Oct 3, 2025 •

edited

Loading

github-actions bot commented Oct 3, 2025 •

edited

Loading

sarnex Oct 3, 2025 •

edited

Loading

llvmbot commented Oct 3, 2025 •

edited

Loading

jhuber6 commented Oct 9, 2025 •

edited

Loading