Skip to content

Conversation

sarnex
Copy link
Member

@sarnex sarnex commented Oct 3, 2025

The original change unifying the device kernel attributes had some inexplicable behavior, such as amdgpu_kernel resulting in a function ending up with the spir_kernel CC but nvptx_kernel not doing the same, both cases compiling for SPIR. There was also a crash.

For the target-specific spellings (nvptx_kernel and amdgpu_kernel), while not technically required, we warn and ignore the attribute if the spelling doesn't match the target because it's weird from the user's point of view to allow it.

Also we make sure that any valid usage actually applies the CC to the generated llvm:Function. This worked for NVPTX already but was missing for SPIR/SPIR-V and AMDGPU, it needs to be explicitly done in TargetInfo. This allows us to remove the amdgpu_kernel specific handing we had. That special handling was previously required because it was the only variation that was allowed on a type, and thus had a separate way to propagate the CC.

These issues were reported here and here.

Closes: #161077

Copy link

github-actions bot commented Oct 3, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions c,h,cpp -- clang/test/Sema/callingconv-devicekernel.c clang/lib/AST/TypePrinter.cpp clang/lib/Basic/Targets/NVPTX.h clang/lib/CodeGen/Targets/AMDGPU.cpp clang/lib/CodeGen/Targets/NVPTX.cpp clang/lib/CodeGen/Targets/SPIR.cpp clang/lib/Sema/SemaDeclAttr.cpp clang/lib/Sema/SemaType.cpp clang/test/Sema/callingconv.c

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 568d56ab0..4d3ad7264 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2146,7 +2146,9 @@ void TypePrinter::printAttributedAfter(const AttributedType *T,
    break;
   }
   case attr::AArch64VectorPcs: OS << "aarch64_vector_pcs"; break;
-  case attr::AArch64SVEPcs: OS << "aarch64_sve_pcs"; break;
+  case attr::AArch64SVEPcs:
+    OS << "aarch64_sve_pcs";
+    break;
   case attr::IntelOclBicc:
     OS << "inteloclbicc";
     break;

case ParsedAttr::AT_Pascal: \
case ParsedAttr::AT_SwiftCall: \
case ParsedAttr::AT_SwiftAsyncCall: \
case ParsedAttr::AT_VectorCall: \
Copy link
Member Author

@sarnex sarnex Oct 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think clang-format is wrong so I'm planning on ignoring it (see above GH comment from the bot, not this code).
Let me know you disagree and I'm happy to fix in in this PR or separately.

@sarnex sarnex marked this pull request as ready for review October 3, 2025 21:43
@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:codegen IR generation bugs: mangling, exceptions, etc. labels Oct 3, 2025
@llvmbot
Copy link
Member

llvmbot commented Oct 3, 2025

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-backend-amdgpu

Author: Nick Sarnie (sarnex)

Changes

The original change unifying the device kernel attributes had some inexplicable behavior, such as amdgpu_kernel resulting in a function ending up with the spir_kernel CC but nvptx_kernel not doing the same, both cases compiling for SPIR. There was also a crash.

For the target-specific spellings (nvptx_kernel and amdgpu_kernel), while not technically required, we warn and ignore the attribute if the spelling doesn't match the target because it's weird from the user's point of view to allow it.

Also we make sure that any valid usage actually applies the CC to the generated llvm:Function. This worked for NVPTX already but was missing for SPIR/SPIR-V and AMDGPU, it needs to be explicitly done in TargetInfo. This allows us to remove the amdgpu_kernel specific handing we had. That special handling was previously required because it was the only variation that was allowed on a type, and thus had a separate way to propagate the CC.

These issues were reported here and here.

Closes: #161077


Full diff: https://github.com/llvm/llvm-project/pull/161905.diff

10 Files Affected:

  • (modified) clang/docs/ReleaseNotes.rst (+1)
  • (modified) clang/include/clang/Basic/Attr.td (+1-1)
  • (modified) clang/lib/AST/TypePrinter.cpp (-3)
  • (modified) clang/lib/Basic/Targets/NVPTX.h (+1-1)
  • (modified) clang/lib/CodeGen/Targets/AMDGPU.cpp (+4-2)
  • (modified) clang/lib/CodeGen/Targets/SPIR.cpp (+30-3)
  • (modified) clang/lib/Sema/SemaDeclAttr.cpp (+33-3)
  • (modified) clang/lib/Sema/SemaType.cpp (+2-16)
  • (added) clang/test/Sema/callingconv-devicekernel.c (+24)
  • (modified) clang/test/Sema/callingconv.c (+4)
diff --git a/clang/docs/ReleaseNotes.rst b/clang/docs/ReleaseNotes.rst
index d2e5bd284d350..da391082eba5b 100644
--- a/clang/docs/ReleaseNotes.rst
+++ b/clang/docs/ReleaseNotes.rst
@@ -401,6 +401,7 @@ Bug Fixes to Attribute Support
 - Using ``[[gnu::cleanup(some_func)]]`` where some_func is annotated with
   ``[[gnu::error("some error")]]`` now correctly triggers an error. (#GH146520)
 - Fix a crash when the function name is empty in the `swift_name` attribute. (#GH157075)
+- Fixes crashes or missing diagnostics with the `device_kernel` attribute. (#GH161905)
 
 Bug Fixes to C++ Support
 ^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index 3c697ed8dd882..6ec0eac529245 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1599,7 +1599,7 @@ def CUDAShared : InheritableAttr {
 }
 def : MutualExclusions<[CUDAConstant, CUDAShared, HIPManaged]>;
 
-def DeviceKernel : DeclOrTypeAttr {
+def DeviceKernel : InheritableAttr {
   let Spellings = [Clang<"device_kernel">, Clang<"sycl_kernel">,
                    Clang<"nvptx_kernel">, Clang<"amdgpu_kernel">,
                    CustomKeyword<"__kernel">, CustomKeyword<"kernel">];
diff --git a/clang/lib/AST/TypePrinter.cpp b/clang/lib/AST/TypePrinter.cpp
index 66a1b684ec68b..568d56ab0b911 100644
--- a/clang/lib/AST/TypePrinter.cpp
+++ b/clang/lib/AST/TypePrinter.cpp
@@ -2147,9 +2147,6 @@ void TypePrinter::printAttributedAfter(const AttributedType *T,
   }
   case attr::AArch64VectorPcs: OS << "aarch64_vector_pcs"; break;
   case attr::AArch64SVEPcs: OS << "aarch64_sve_pcs"; break;
-  case attr::DeviceKernel:
-    OS << T->getAttr()->getSpelling();
-    break;
   case attr::IntelOclBicc:
     OS << "inteloclbicc";
     break;
diff --git a/clang/lib/Basic/Targets/NVPTX.h b/clang/lib/Basic/Targets/NVPTX.h
index 33c29586359be..f5c8396f398aa 100644
--- a/clang/lib/Basic/Targets/NVPTX.h
+++ b/clang/lib/Basic/Targets/NVPTX.h
@@ -200,7 +200,7 @@ class LLVM_LIBRARY_VISIBILITY NVPTXTargetInfo : public TargetInfo {
     // a host function.
     if (HostTarget)
       return HostTarget->checkCallingConvention(CC);
-    return CCCR_Warning;
+    return CC == CC_DeviceKernel ? CCCR_OK : CCCR_Warning;
   }
 
   bool hasBitIntType() const override { return true; }
diff --git a/clang/lib/CodeGen/Targets/AMDGPU.cpp b/clang/lib/CodeGen/Targets/AMDGPU.cpp
index 0fcbf7e458a34..d54b1dc128254 100644
--- a/clang/lib/CodeGen/Targets/AMDGPU.cpp
+++ b/clang/lib/CodeGen/Targets/AMDGPU.cpp
@@ -419,9 +419,11 @@ void AMDGPUTargetCodeGenInfo::setTargetAttributes(
     return;
 
   const FunctionDecl *FD = dyn_cast_or_null<FunctionDecl>(D);
-  if (FD)
+  if (FD) {
     setFunctionDeclAttributes(FD, F, M);
-
+    if (FD->hasAttr<DeviceKernelAttr>() && !M.getLangOpts().OpenCL)
+      F->setCallingConv(llvm::CallingConv::AMDGPU_KERNEL);
+  }
   if (!getABIInfo().getCodeGenOpts().EmitIEEENaNCompliantInsts)
     F->addFnAttr("amdgpu-ieee", "false");
 }
diff --git a/clang/lib/CodeGen/Targets/SPIR.cpp b/clang/lib/CodeGen/Targets/SPIR.cpp
index 4aa63143a66cd..42ef1c704831c 100644
--- a/clang/lib/CodeGen/Targets/SPIR.cpp
+++ b/clang/lib/CodeGen/Targets/SPIR.cpp
@@ -61,6 +61,8 @@ class CommonSPIRTargetCodeGenInfo : public TargetCodeGenInfo {
       QualType SampledType, CodeGenModule &CGM) const;
   void
   setOCLKernelStubCallingConvention(const FunctionType *&FT) const override;
+  void setTargetAttributes(const Decl *D, llvm::GlobalValue *GV,
+                           CodeGen::CodeGenModule &M) const override;
 };
 class SPIRVTargetCodeGenInfo : public CommonSPIRTargetCodeGenInfo {
 public:
@@ -240,6 +242,26 @@ void CommonSPIRTargetCodeGenInfo::setOCLKernelStubCallingConvention(
       FT, FT->getExtInfo().withCallingConv(CC_SpirFunction));
 }
 
+void CommonSPIRTargetCodeGenInfo::setTargetAttributes(
+    const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
+  if (M.getLangOpts().OpenCL)
+    return;
+
+  if (GV->isDeclaration())
+    return;
+
+  llvm::Function *F = dyn_cast<llvm::Function>(GV);
+  if (!F)
+    return;
+
+  const FunctionDecl *FD = dyn_cast<FunctionDecl>(D);
+  if (!FD)
+    return;
+
+  if (FD->hasAttr<DeviceKernelAttr>())
+    F->setCallingConv(getDeviceKernelCallingConv());
+}
+
 LangAS
 SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace(CodeGenModule &CGM,
                                                  const VarDecl *D) const {
@@ -264,9 +286,6 @@ SPIRVTargetCodeGenInfo::getGlobalVarAddressSpace(CodeGenModule &CGM,
 
 void SPIRVTargetCodeGenInfo::setTargetAttributes(
     const Decl *D, llvm::GlobalValue *GV, CodeGen::CodeGenModule &M) const {
-  if (!M.getLangOpts().HIP ||
-      M.getTarget().getTriple().getVendor() != llvm::Triple::AMD)
-    return;
   if (GV->isDeclaration())
     return;
 
@@ -277,6 +296,14 @@ void SPIRVTargetCodeGenInfo::setTargetAttributes(
   auto FD = dyn_cast_or_null<FunctionDecl>(D);
   if (!FD)
     return;
+
+  if (FD->hasAttr<DeviceKernelAttr>())
+    F->setCallingConv(llvm::CallingConv::SPIR_KERNEL);
+
+  if (!M.getLangOpts().HIP ||
+      M.getTarget().getTriple().getVendor() != llvm::Triple::AMD)
+    return;
+
   if (!FD->hasAttr<CUDAGlobalAttr>())
     return;
 
diff --git a/clang/lib/Sema/SemaDeclAttr.cpp b/clang/lib/Sema/SemaDeclAttr.cpp
index 328ccf6694073..551d00b3c7476 100644
--- a/clang/lib/Sema/SemaDeclAttr.cpp
+++ b/clang/lib/Sema/SemaDeclAttr.cpp
@@ -5204,25 +5204,55 @@ static void handleCallConvAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
 static void handleDeviceKernelAttr(Sema &S, Decl *D, const ParsedAttr &AL) {
   const auto *FD = dyn_cast_or_null<FunctionDecl>(D);
   bool IsFunctionTemplate = FD && FD->getDescribedFunctionTemplate();
-  if (S.getLangOpts().SYCLIsDevice) {
+  llvm::Triple Triple = S.getASTContext().getTargetInfo().getTriple();
+  const LangOptions &LangOpts = S.getLangOpts();
+
+  if (LangOpts.SYCLIsDevice) {
     if (!IsFunctionTemplate) {
       S.Diag(AL.getLoc(), diag::warn_attribute_wrong_decl_type_str)
           << AL << AL.isRegularKeywordAttribute() << "function templates";
+      AL.setInvalid();
+      return;
     } else {
       S.SYCL().handleKernelAttr(D, AL);
     }
   } else if (DeviceKernelAttr::isSYCLSpelling(AL)) {
     S.Diag(AL.getLoc(), diag::warn_attribute_ignored) << AL;
-  } else if (S.getASTContext().getTargetInfo().getTriple().isNVPTX()) {
+    AL.setInvalid();
+
+    return;
+  } else if (Triple.isNVPTX()) {
     handleGlobalAttr(S, D, AL);
   } else {
     // OpenCL C++ will throw a more specific error.
-    if (!S.getLangOpts().OpenCLCPlusPlus && (!FD || IsFunctionTemplate)) {
+    if (!LangOpts.OpenCLCPlusPlus && (!FD || IsFunctionTemplate)) {
       S.Diag(AL.getLoc(), diag::err_attribute_wrong_decl_type_str)
           << AL << AL.isRegularKeywordAttribute() << "functions";
+      AL.setInvalid();
+      return;
     }
     handleSimpleAttribute<DeviceKernelAttr>(S, D, AL);
   }
+  // TODO: isGPU() should probably return true for SPIR.
+  bool TargetDeviceEnvironment = Triple.isGPU() || Triple.isSPIR() ||
+                                 LangOpts.isTargetDevice() || LangOpts.OpenCL;
+  bool IsAMDGPUMismatch =
+      DeviceKernelAttr::isAMDGPUSpelling(AL) && !Triple.isAMDGPU();
+  bool IsNVPTXMismatch =
+      DeviceKernelAttr::isNVPTXSpelling(AL) && !Triple.isNVPTX();
+  if (IsAMDGPUMismatch || IsNVPTXMismatch || !TargetDeviceEnvironment) {
+    // While both are just different spellings of the same underlying
+    // attribute, it makes more sense to the user if amdgpu_kernel can only
+    // be used on AMDGPU and the equivalent for NVPTX, so warn and ignore
+    // the attribute if there's a mismatch.
+    // Also warn if this is not an environment where a device kernel makes
+    // sense.
+    S.Diag(AL.getLoc(), diag::warn_cconv_unsupported)
+        << AL << (int)Sema::CallingConventionIgnoredReason::ForThisTarget;
+    AL.setInvalid();
+    return;
+  }
+
   // Make sure we validate the CC with the target
   // and warn/error if necessary.
   handleCallConvAttr(S, D, AL);
diff --git a/clang/lib/Sema/SemaType.cpp b/clang/lib/Sema/SemaType.cpp
index bee613aa5f1c5..0d5b0e7e842b3 100644
--- a/clang/lib/Sema/SemaType.cpp
+++ b/clang/lib/Sema/SemaType.cpp
@@ -134,7 +134,6 @@ static void diagnoseBadTypeAttribute(Sema &S, const ParsedAttr &attr,
   case ParsedAttr::AT_VectorCall:                                              \
   case ParsedAttr::AT_AArch64VectorPcs:                                        \
   case ParsedAttr::AT_AArch64SVEPcs:                                           \
-  case ParsedAttr::AT_DeviceKernel:                                            \
   case ParsedAttr::AT_MSABI:                                                   \
   case ParsedAttr::AT_SysVABI:                                                 \
   case ParsedAttr::AT_Pcs:                                                     \
@@ -3781,7 +3780,8 @@ static CallingConv getCCForDeclaratorChunk(
     }
   }
   if (!S.getLangOpts().isSYCL()) {
-    for (const ParsedAttr &AL : D.getDeclSpec().getAttributes()) {
+    for (const ParsedAttr &AL : llvm::concat<ParsedAttr>(
+             D.getDeclSpec().getAttributes(), D.getAttributes())) {
       if (AL.getKind() == ParsedAttr::AT_DeviceKernel) {
         CC = CC_DeviceKernel;
         break;
@@ -7565,8 +7565,6 @@ static Attr *getCCTypeAttr(ASTContext &Ctx, ParsedAttr &Attr) {
     return createSimpleAttr<AArch64SVEPcsAttr>(Ctx, Attr);
   case ParsedAttr::AT_ArmStreaming:
     return createSimpleAttr<ArmStreamingAttr>(Ctx, Attr);
-  case ParsedAttr::AT_DeviceKernel:
-    return createSimpleAttr<DeviceKernelAttr>(Ctx, Attr);
   case ParsedAttr::AT_Pcs: {
     // The attribute may have had a fixit applied where we treated an
     // identifier as a string literal.  The contents of the string are valid,
@@ -8805,16 +8803,6 @@ static void HandleHLSLParamModifierAttr(TypeProcessingState &State,
   }
 }
 
-static bool isMultiSubjectAttrAllowedOnType(const ParsedAttr &Attr) {
-  // The DeviceKernel attribute is shared for many targets, and
-  // it is only allowed to be a type attribute with the AMDGPU
-  // spelling, so skip processing the attr as a type attr
-  // unless it has that spelling.
-  if (Attr.getKind() != ParsedAttr::AT_DeviceKernel)
-    return true;
-  return DeviceKernelAttr::isAMDGPUSpelling(Attr);
-}
-
 static void processTypeAttrs(TypeProcessingState &state, QualType &type,
                              TypeAttrLocation TAL,
                              const ParsedAttributesView &attrs,
@@ -9068,8 +9056,6 @@ static void processTypeAttrs(TypeProcessingState &state, QualType &type,
         break;
       [[fallthrough]];
     FUNCTION_TYPE_ATTRS_CASELIST:
-      if (!isMultiSubjectAttrAllowedOnType(attr))
-        break;
 
       attr.setUsedAsTypeAttr();
 
diff --git a/clang/test/Sema/callingconv-devicekernel.c b/clang/test/Sema/callingconv-devicekernel.c
new file mode 100644
index 0000000000000..869687f8ca65d
--- /dev/null
+++ b/clang/test/Sema/callingconv-devicekernel.c
@@ -0,0 +1,24 @@
+// RUN: %clang_cc1 -triple amdgcn-amd-amdhsa -emit-llvm %s 2>&1 -o -| FileCheck -check-prefix=CHECK-AMDGPU %s
+// RUN: %clang_cc1 -triple nvptx64-nvidia-cuda- -emit-llvm %s 2>&1 -o -| FileCheck -check-prefix=CHECK-NVPTX %s
+// RUN: %clang_cc1 -triple spir64 -emit-llvm %s 2>&1 -o - | FileCheck -check-prefix=CHECK-SPIR %s
+// RUN: %clang_cc1 -triple spirv64 -emit-llvm %s 2>&1 -o - | FileCheck -check-prefix=CHECK-SPIR %s
+
+// CHECK-AMDGPU-DAG: amdgpu_kernel void @kernel1()
+// CHECK-NVPTX-DAG: ptx_kernel void @kernel1()
+// CHECK-SPIR-DAG: spir_kernel void @kernel1()
+[[clang::device_kernel]] void kernel1() {}
+
+// CHECK-AMDGPU-DAG: amdgpu_kernel void @kernel2()
+// CHECK-NVPTX-DAG: 14:3: warning: 'clang::amdgpu_kernel' calling convention is not supported for this target
+// CHECK-SPIR-DAG: 14:3: warning: 'clang::amdgpu_kernel' calling convention is not supported for this target
+[[clang::amdgpu_kernel]] void kernel2() {}
+
+// CHECK-AMDGPU-DAG: 19:3: warning: 'clang::nvptx_kernel' calling convention is not supported for this target
+// CHECK-NVPTX-DAG: ptx_kernel void @kernel3()
+// CHECK-SPIR-DAG: 19:3: warning: 'clang::nvptx_kernel' calling convention is not supported for this target
+[[clang::nvptx_kernel]] void kernel3() {}
+
+// CHECK-AMDGPU-DAG: 24:3: warning: 'clang::sycl_kernel' attribute ignored
+// CHECK-NVPTX-DAG: 24:3: warning: 'clang::sycl_kernel' attribute ignored
+// CHECK-SPIR-DAG: 24:3: warning: 'clang::sycl_kernel' attribute ignored
+[[clang::sycl_kernel]] void kernel4() {}
diff --git a/clang/test/Sema/callingconv.c b/clang/test/Sema/callingconv.c
index f0b8b80a32974..28342b56af39a 100644
--- a/clang/test/Sema/callingconv.c
+++ b/clang/test/Sema/callingconv.c
@@ -55,6 +55,10 @@ int __attribute__((aarch64_vector_pcs)) aavpcs(void); // expected-warning {{'aar
 int __attribute__((aarch64_sve_pcs)) aasvepcs(void);  // expected-warning {{'aarch64_sve_pcs' calling convention is not supported for this target}}
 
 int __attribute__((amdgpu_kernel)) amdgpu_kernel(void); // expected-warning {{'amdgpu_kernel' calling convention is not supported for this target}}
+int __attribute__((device_kernel)) device_kernel(void) { // expected-warning {{'device_kernel' calling convention is not supported for this target}}
+}
+int __attribute__((sycl_kernel)) sycl_kernel(void) { // expected-warning {{'sycl_kernel' attribute ignored}}
+}
 
 // PR6361
 void ctest3();

@sarnex
Copy link
Member Author

sarnex commented Oct 3, 2025

FYI: @camc

return;

if (FD->hasAttr<DeviceKernelAttr>())
F->setCallingConv(llvm::CallingConv::SPIR_KERNEL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the attribute called sycl_kernel? I thought that was a little strange since the others use the architecture name, not the language.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I understand the question but there is no SYCL-specific calling convention, ex1, ex2. There is another alias (aka spelling) of the clang attribute but we have no need to check which specific spelling it's using here.

Comment on lines +5236 to +5243
// TODO: isGPU() should probably return true for SPIR.
bool TargetDeviceEnvironment = Triple.isGPU() || Triple.isSPIR() ||
LangOpts.isTargetDevice() || LangOpts.OpenCL;
bool IsAMDGPUMismatch =
DeviceKernelAttr::isAMDGPUSpelling(AL) && !Triple.isAMDGPU();
bool IsNVPTXMismatch =
DeviceKernelAttr::isNVPTXSpelling(AL) && !Triple.isNVPTX();
if (IsAMDGPUMismatch || IsNVPTXMismatch || !TargetDeviceEnvironment) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly I'm wondering what we should do here. Realistically it would make sense for these 'named' attributes to just be legacy aliases to device_kernel. I don't see any real value in keeping separate names unless there are special semantics that I'm unaware of. Ever since PTX moved over, this attribute is more plainly "set the kernel calling convention on this function."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would definitely be simpler to have them as aliases. I'm happy to implement whatever the consensus us, so others feel free to drop in.

@tahonermann
Copy link
Contributor

I missed the previous PRs that lumped all of these attributes together. I think sycl_kernel should have been left separate as it isn't directly used to declare a device function. It instead has rather complicated semantics that result in a function that is used on the host as well as a synthesized device function (with a different name, different parameters, and a hard-coded calling convention). However, sycl_kernel is also destined to go away in favor of the sycl_kernel_entry_point attribute.

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 7, 2025

I missed the previous PRs that lumped all of these attributes together. I think sycl_kernel should have been left separate as it isn't directly used to declare a device function. It instead has rather complicated semantics that result in a function that is used on the host as well as a synthesized device function (with a different name, different parameters, and a hard-coded calling convention). However, sycl_kernel is also destined to go away in favor of the sycl_kernel_entry_point attribute.

I'd say we could just rename it to spirv_kernel to avoid the confusion, but I'm honestly not sure we need the special names if we just accept device_kernel to mean "externally callable thing in a GPU context." Possibly we could even make this generic on all targets if we just make it imply non-hidden visibility, since that would make a function 'callable' in this context.

@sarnex
Copy link
Member Author

sarnex commented Oct 8, 2025

@tahonermann Would renaming sycl_kernel to spirv_kernel/spir_kernel resolve your concern?

@tahonermann
Copy link
Contributor

@sarnex, @jhuber6,

@tahonermann Would renaming sycl_kernel to spirv_kernel/spir_kernel resolve your concern?

No, that would be much worse. SYCL is not synonymous with SPIR-V. Functions decorated with the sycl_kernel attribute are used to generate kernels for PTX, AMDGCN, as well as SPIR-V. sycl_kernel doesn't make a function an actual kernel entry point.

Unfortunately, it is hard to effectively compare the semantics of sycl_kernel with nvptx_kernel, amdgpu_kernel, kernel, and __kernel because the documentation that purports to describe them only talks about SYCL. I think the changes made to that documentation should be reverted and appropriate documentation for those other attributes added.

@jhuber6
Copy link
Contributor

jhuber6 commented Oct 9, 2025

If the sycl_kernel version has all this special behavior it should be a separate attribute and put under the SYCL language mode. The others should all just be aliases to device_kernel and have those do some unified behavior, imply non-hidden visibility (if hidden by default upgrade to protected), and add the appropriate target calling convention (if one exists).

@sarnex
Copy link
Member Author

sarnex commented Oct 9, 2025

Joseph's suggestion seems the best to me, having one of the aliases have totally different semantics is just asking for trouble. If @tahonermann agrees I'll update this PR to implement that.

@tahonermann
Copy link
Contributor

Joseph's suggestion seems the best to me, having one of the aliases have totally different semantics is just asking for trouble. If @tahonermann agrees I'll update this PR to implement that.

Yes, please. Enablement of the sycl_kernel attribute is already subject to use of the -fsycl option. Please do revert the prior change to the documentation and add appropriate documentation for device_kernel and friends. I suggest doing this in a separate PR (probably one that lands before this one).

@sarnex
Copy link
Member Author

sarnex commented Oct 10, 2025

Will do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[clang] crash on __attribute__((sycl_kernel))

5 participants