[NVPTX] Add support for "blocksareclusters" kernel attr #152265

rajatbajpai · 2025-08-06T07:46:21Z

This change introduces a new kernel attribute that allows thread blocks to be mapped to clusters.

llvmbot · 2025-08-06T07:46:55Z

@llvm/pr-subscribers-backend-nvptx

Author: Rajat Bajpai (rajatbajpai)

Changes

This change introduces a new kernel attribute that allows thread blocks to be mapped to clusters.

Full diff: https://github.com/llvm/llvm-project/pull/152265.diff

3 Files Affected:

(modified) llvm/docs/NVPTXUsage.rst (+6)
(modified) llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp (+16-4)
(added) llvm/test/CodeGen/NVPTX/blocksareclusters-kernel-attr.ll (+78)

diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index d28eb6860c33a..65f9db3f248e5 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -92,6 +92,12 @@ Function Attributes
     dimension. Specifying a different cluster dimension at launch will result in
     a runtime error or kernel launch failure. Only supported for Hopper+.
 
+``"nvvm.blocksareclusters"``
+    This attribute implies that the grid launch configuration for the corresponding
+    kernel function is specifying the number of clusters instead of the number of thread
+    blocks. This attribute is only allowed for kernel functions and requires
+    ``nvvm.reqntid`` and ``nvvm.cluster_dim`` attributes.
+
 .. _address_spaces:
 
 Address Spaces
diff --git a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
index 38912a7f09e30..096f94922dce6 100644
--- a/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
+++ b/llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp
@@ -414,6 +414,18 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function &F,
   // the reqntid directive, and set the unspecified ones to 1.
   // If none of Reqntid* is specified, don't output reqntid directive.
   const auto ReqNTID = getReqNTID(F);
+
+  const NVPTXTargetMachine &NTM = static_cast<const NVPTXTargetMachine &>(TM);
+  const auto *STI = static_cast<const NVPTXSubtarget *>(NTM.getSubtargetImpl());
+
+  const bool BlocksAreClusters =
+      F.hasFnAttribute("nvvm.blocksareclusters");
+  if (BlocksAreClusters && STI->getSmVersion() >= 90) {
+    if (ReqNTID.empty() || getClusterDim(F).empty())
+      report_fatal_error("blocksareclusters requires reqntid and cluster_dim");
+    O << ".blocksareclusters\n";
+  }
+
   if (!ReqNTID.empty())
     O << formatv(".reqntid {0:$[, ]}\n",
                  make_range(ReqNTID.begin(), ReqNTID.end()));
@@ -431,14 +443,14 @@ void NVPTXAsmPrinter::emitKernelFunctionDirectives(const Function &F,
 
   // .maxclusterrank directive requires SM_90 or higher, make sure that we
   // filter it out for lower SM versions, as it causes a hard ptxas crash.
-  const NVPTXTargetMachine &NTM = static_cast<const NVPTXTargetMachine &>(TM);
-  const auto *STI = static_cast<const NVPTXSubtarget *>(NTM.getSubtargetImpl());
-
   if (STI->getSmVersion() >= 90) {
     const auto ClusterDim = getClusterDim(F);
 
     if (!ClusterDim.empty()) {
-      O << ".explicitcluster\n";
+
+      if (!BlocksAreClusters)
+        O << ".explicitcluster\n";
+
       if (ClusterDim[0] != 0) {
         assert(llvm::all_of(ClusterDim, [](unsigned D) { return D != 0; }) &&
                "cluster_dim_x != 0 implies cluster_dim_y and cluster_dim_z "
diff --git a/llvm/test/CodeGen/NVPTX/blocksareclusters-kernel-attr.ll b/llvm/test/CodeGen/NVPTX/blocksareclusters-kernel-attr.ll
new file mode 100644
index 0000000000000..13357f015a176
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/blocksareclusters-kernel-attr.ll
@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100 | FileCheck %s
+
+target triple = "nvptx64-nvidia-cuda"
+
+; Test "blocksareclusters" attribute with full "reqntid" and "cluster_dim"
+; attributes.
+define ptx_kernel void @kernel1(i32* %input, i32* %output) #0 #1 #2 {
+; CHECK-LABEL: kernel1(
+; CHECK:       .blocksareclusters
+; CHECK-NEXT:  .reqntid 1024, 1, 1
+; CHECK-NEXT:  .reqnctapercluster 2, 2, 2
+; CHECK-NEXT:  {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ret;
+  ret void
+}
+
+; Test "blocksareclusters" attribute with single dimension "reqntid" and
+; "cluster_dim" attributes.
+define ptx_kernel void @kernel2(i32* %input, i32* %output) #0 #3 #4 {
+; CHECK-LABEL: kernel2(
+; CHECK:       .blocksareclusters
+; CHECK-NEXT:  .reqntid 1024
+; CHECK-NEXT:  .reqnctapercluster 2 // @kernel2
+; CHECK-NEXT:  {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ret;
+  ret void
+}
+
+; Test "blocksareclusters" attribute with two dimensions(not z dimension)
+; "reqntid" and "cluster_dim" attributes.
+define ptx_kernel void @kernel3(i32* %input, i32* %output) #0 #5 #6 {
+; CHECK-LABEL: kernel3(
+; CHECK:       .blocksareclusters
+; CHECK-NEXT:  .reqntid 512, 2
+; CHECK-NEXT:  .reqnctapercluster 2, 2 // @kernel3
+; CHECK-NEXT:  {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ret;
+  ret void
+}
+
+; Test "blocksareclusters" attribute with full "reqntid" and "cluster_dim"
+; attributes where kernel attribute is provided through metadata.
+define void @kernel4(i32* %input, i32* %output) #0 #1 #2 {
+; CHECK-LABEL: kernel4(
+; CHECK:       .blocksareclusters
+; CHECK-NEXT:  .reqntid 1024, 1, 1
+; CHECK-NEXT:  .reqnctapercluster 2, 2, 2 // @kernel4
+; CHECK-NEXT:  {
+; CHECK-EMPTY:
+; CHECK-EMPTY:
+; CHECK-NEXT:  // %bb.0:
+; CHECK-NEXT:    ret;
+  ret void
+}
+
+attributes #0 = { "nvvm.blocksareclusters" }
+
+attributes #1 = { "nvvm.reqntid"="1024,1,1" }
+attributes #2 = { "nvvm.cluster_dim"="2,2,2" }
+
+attributes #3 = { "nvvm.reqntid"="1024" }
+attributes #4 = { "nvvm.cluster_dim"="2" }
+
+attributes #5 = { "nvvm.reqntid"="512,2" }
+attributes #6 = { "nvvm.cluster_dim"="2,2" }
+
+!0 = !{void (i32*, i32*)* @kernel4, !"kernel", i32 1 }
+!nvvm.annotations = !{!0}

github-actions · 2025-08-06T07:48:49Z

✅ With the latest revision this PR passed the C/C++ code formatter.

This change introduces a new kernel attribute that allows thread blocks to be mapped to clusters.

llvm/docs/NVPTXUsage.rst

rajatbajpai · 2025-08-13T03:26:35Z

gentle ping for review.

AlexMaclean · 2025-08-13T14:24:42Z

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

+  const NVPTXTargetMachine &NTM = static_cast<const NVPTXTargetMachine &>(TM);
+  const auto *STI = static_cast<const NVPTXSubtarget *>(NTM.getSubtargetImpl());
+
+  const bool BlocksAreClusters = F.hasFnAttribute("nvvm.blocksareclusters");


Pull F.hasFnAttribute("nvvm.blocksareclusters"); out into a separate function in NVPTXUtilites similar to how we get the other attributes. I think it is cleaner to keep all the NVVM attribute names in a single place instead of mixing them into the assemble printing.

AlexMaclean · 2025-08-13T14:30:03Z

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

+  const auto *STI = static_cast<const NVPTXSubtarget *>(NTM.getSubtargetImpl());
+
+  const bool BlocksAreClusters = F.hasFnAttribute("nvvm.blocksareclusters");
+  if (BlocksAreClusters && STI->getSmVersion() >= 90) {


Does the order of these directives matter? If not lets move this logic down into the existing if (STI->getSmVersion() >= 90) which handles cluster attrs. If the order does matter please add a comment explaining this.

AlexMaclean · 2025-08-13T14:30:38Z

llvm/test/CodeGen/NVPTX/blocksareclusters-kernel-attr.ll

+
+; Test "blocksareclusters" attribute with full "reqntid" and "cluster_dim"
+; attributes.
+define ptx_kernel void @kernel1(i32* %input, i32* %output) #0 #1 #2 {


Use opaque pointers

AlexMaclean · 2025-08-13T14:32:50Z

llvm/test/CodeGen/NVPTX/blocksareclusters-kernel-attr.ll

@@ -0,0 +1,78 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100 | FileCheck %s


Don't specify the triple here as it is redundant.

AlexMaclean · 2025-08-13T14:34:06Z

llvm/test/CodeGen/NVPTX/blocksareclusters-kernel-attr.ll

+; Test "blocksareclusters" attribute with full "reqntid" and "cluster_dim"
+; attributes where kernel attribute is provided through metadata.


We're just going to auto-upgrade this to the calling convention and there are already tests for that logic. This test can be removed.

AlexMaclean · 2025-08-13T14:34:43Z

llvm/lib/Target/NVPTX/NVPTXAsmPrinter.cpp

+  if (BlocksAreClusters && STI->getSmVersion() >= 90) {
+    if (ReqNTID.empty() || getClusterDim(F).empty())
+      report_fatal_error("blocksareclusters requires reqntid and cluster_dim");
+    O << ".blocksareclusters\n";


Should we also check the PTX ISA version?

rajatbajpai requested a review from AlexMaclean August 6, 2025 07:46

rajatbajpai self-assigned this Aug 6, 2025

llvmbot added the backend:NVPTX label Aug 6, 2025

[NVPTX] Add support for "blocksareclusters" kernel attr

8cec6f3

This change introduces a new kernel attribute that allows thread blocks to be mapped to clusters.

rajatbajpai force-pushed the dev/rbajpai/blocksareclusters-upstream branch from 4bbfecc to 8cec6f3 Compare August 6, 2025 08:32

AlexMaclean reviewed Aug 6, 2025

View reviewed changes

llvm/docs/NVPTXUsage.rst Show resolved Hide resolved

AlexMaclean reviewed Aug 13, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Add support for "blocksareclusters" kernel attr #152265

[NVPTX] Add support for "blocksareclusters" kernel attr #152265

rajatbajpai commented Aug 6, 2025

Uh oh!

llvmbot commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025 •

edited

Loading

Uh oh!

Uh oh!

rajatbajpai commented Aug 13, 2025

Uh oh!

AlexMaclean Aug 13, 2025

Uh oh!

AlexMaclean Aug 13, 2025

Uh oh!

AlexMaclean Aug 13, 2025

Uh oh!

AlexMaclean Aug 13, 2025

Uh oh!

AlexMaclean Aug 13, 2025

Uh oh!

AlexMaclean Aug 13, 2025

Uh oh!

Uh oh!

		@@ -0,0 +1,78 @@
		; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
		; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100 \| FileCheck %s

		; Test "blocksareclusters" attribute with full "reqntid" and "cluster_dim"
		; attributes where kernel attribute is provided through metadata.

[NVPTX] Add support for "blocksareclusters" kernel attr #152265

Are you sure you want to change the base?

[NVPTX] Add support for "blocksareclusters" kernel attr #152265

Conversation

rajatbajpai commented Aug 6, 2025

Uh oh!

llvmbot commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

rajatbajpai commented Aug 13, 2025

Uh oh!

AlexMaclean Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

AlexMaclean Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Aug 6, 2025 •

edited

Loading