Skip to content

Commit f62c379

Browse files
[LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction (#134568)
This commit adds NVPTX codegen support for clusterlaunchcontrol instructions with tests under clusterlaunchcontrol.ll and clusterlaunchcontrol-multicast.ll. For more information, Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel)
1 parent 746c682 commit f62c379

File tree

7 files changed

+385
-0
lines changed

7 files changed

+385
-0
lines changed

llvm/docs/NVPTXUsage.rst

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1618,6 +1618,98 @@ similar but the latter uses generic addressing (see `Generic Addressing <https:/
16181618

16191619
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk>`__.
16201620

1621+
1622+
clusterlaunchcontrol Intrinsics
1623+
-------------------------------
1624+
1625+
'``llvm.nvvm.clusterlaunchcontrol.try_cancel*``' Intrinsics
1626+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1627+
1628+
Syntax:
1629+
"""""""
1630+
1631+
.. code-block:: llvm
1632+
1633+
declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
1634+
declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
1635+
1636+
Overview:
1637+
"""""""""
1638+
1639+
The ``clusterlaunchcontrol.try_cancel`` intrinsics requests atomically cancelling
1640+
the launch of a cluster that has not started running yet. It asynchronously non-atomically writes
1641+
a 16-byte opaque response to shared memory, pointed to by 16-byte-aligned ``addr`` indicating whether the
1642+
operation succeeded or failed. ``addr`` and 8-byte-aligned ``mbar`` must refer to ``shared::cta``
1643+
otherwise the behavior is undefined. The completion of the asynchronous operation
1644+
is tracked using the mbarrier completion mechanism at ``.cluster`` scope referenced
1645+
by the shared memory pointer, ``mbar``. On success, the opaque response contains
1646+
the CTA id of the first CTA of the canceled cluster; no other successful response
1647+
from other ``clusterlaunchcontrol.try_cancel`` operations from the same grid will
1648+
contain that id.
1649+
1650+
The ``multicast`` variant specifies that the response is asynchronously non-atomically written to
1651+
the corresponding shared memory location of each CTA in the requesting cluster.
1652+
The completion of the write of each local response is tracked by independent
1653+
mbarriers at the corresponding shared memory location of each CTA in the
1654+
cluster.
1655+
1656+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel>`__.
1657+
1658+
'``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled``' Intrinsic
1659+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1660+
1661+
Syntax:
1662+
"""""""
1663+
1664+
.. code-block:: llvm
1665+
1666+
declare i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
1667+
1668+
Overview:
1669+
"""""""""
1670+
1671+
The ``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled`` intrinsic decodes the opaque response written by the
1672+
``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
1673+
1674+
The intrinsic returns ``0`` (false) if the request failed. If the request succeeded,
1675+
it returns ``1`` (true). A true result indicates that:
1676+
1677+
- the thread block cluster whose first CTA id matches that of the response
1678+
handle will not run, and
1679+
- no other successful response of another ``try_cancel`` request in the grid will contain
1680+
the first CTA id of that cluster
1681+
1682+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
1683+
1684+
1685+
'``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.*``' Intrinsics
1686+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1687+
1688+
Syntax:
1689+
"""""""
1690+
1691+
.. code-block:: llvm
1692+
1693+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
1694+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
1695+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
1696+
1697+
Overview:
1698+
"""""""""
1699+
1700+
The ``clusterlaunchcontrol.query_cancel.get_first_ctaid.*`` intrinsic can be
1701+
used to decode the successful opaque response written by the
1702+
``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
1703+
1704+
If the request succeeded:
1705+
1706+
- ``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.{x,y,z}`` returns
1707+
the coordinate of the first CTA in the canceled cluster, either x, y, or z.
1708+
1709+
If the request failed, the behavior of these intrinsics is undefined.
1710+
1711+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
1712+
16211713
Other Intrinsics
16221714
----------------
16231715

llvm/include/llvm/IR/IntrinsicsNVVM.td

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2277,4 +2277,33 @@ let IntrProperties = [IntrArgMemOnly, IntrWriteMem, WriteOnly<ArgIndex<0>>,
22772277
def int_nvvm_st_bulk_shared_cta :
22782278
DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_i64_ty, llvm_i64_ty]>;
22792279
}
2280+
2281+
//
2282+
// clusterlaunchcontorl Intrinsics
2283+
//
2284+
2285+
// clusterlaunchcontrol.try_cancel
2286+
2287+
def int_nvvm_clusterlaunchcontrol_try_cancel_async_shared
2288+
: DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
2289+
[IntrHasSideEffects, IntrArgMemOnly],
2290+
"llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared">;
2291+
2292+
def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared
2293+
: DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
2294+
[IntrHasSideEffects, IntrArgMemOnly],
2295+
"llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared">;
2296+
2297+
// clusterlaunchcontrol.query_cancel.is_canceled
2298+
2299+
def int_nvvm_clusterlaunchcontrol_query_cancel_is_canceled
2300+
: DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_i128_ty], [IntrNoMem, IntrSpeculatable],
2301+
"llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled">;
2302+
2303+
foreach dim = ["x", "y", "z"] in {
2304+
def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_ # dim
2305+
: DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i128_ty], [IntrNoMem, IntrSpeculatable],
2306+
"llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid." # dim>;
2307+
}
2308+
22802309
} // let TargetPrefix = "nvvm"

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1051,6 +1051,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
10511051
Custom);
10521052

10531053
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
1054+
// Enable custom lowering for the i128 bit operand with clusterlaunchcontrol
1055+
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i128, Custom);
10541056
}
10551057

10561058
const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
@@ -1129,6 +1131,10 @@ const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
11291131
MAKE_CASE(NVPTXISD::BrxEnd)
11301132
MAKE_CASE(NVPTXISD::BrxItem)
11311133
MAKE_CASE(NVPTXISD::BrxStart)
1134+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED)
1135+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X)
1136+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y)
1137+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z)
11321138
}
11331139
return nullptr;
11341140

@@ -2805,12 +2811,57 @@ static SDValue LowerIntrinsicVoid(SDValue Op, SelectionDAG &DAG) {
28052811
return Op;
28062812
}
28072813

2814+
static SDValue LowerClusterLaunchControlQueryCancel(SDValue Op,
2815+
SelectionDAG &DAG) {
2816+
2817+
SDNode *N = Op.getNode();
2818+
if (N->getOperand(1).getValueType() != MVT::i128) {
2819+
// return, if the operand is already lowered
2820+
return SDValue();
2821+
}
2822+
2823+
unsigned IID =
2824+
cast<ConstantSDNode>(N->getOperand(0).getNode())->getZExtValue();
2825+
auto Opcode = [&]() {
2826+
switch (IID) {
2827+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
2828+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED;
2829+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
2830+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X;
2831+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
2832+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y;
2833+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
2834+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z;
2835+
default:
2836+
llvm_unreachable("unsupported/unhandled intrinsic");
2837+
}
2838+
}();
2839+
2840+
SDLoc DL(N);
2841+
SDValue TryCancelResponse = N->getOperand(1);
2842+
SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
2843+
SDValue TryCancelResponse0 =
2844+
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
2845+
DAG.getIntPtrConstant(0, DL));
2846+
SDValue TryCancelResponse1 =
2847+
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
2848+
DAG.getIntPtrConstant(1, DL));
2849+
2850+
return DAG.getNode(Opcode, DL, N->getVTList(),
2851+
{TryCancelResponse0, TryCancelResponse1});
2852+
}
2853+
28082854
static SDValue lowerIntrinsicWOChain(SDValue Op, SelectionDAG &DAG) {
28092855
switch (Op->getConstantOperandVal(0)) {
28102856
default:
28112857
return Op;
28122858
case Intrinsic::nvvm_internal_addrspace_wrap:
28132859
return Op.getOperand(1);
2860+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
2861+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
2862+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
2863+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
2864+
return LowerClusterLaunchControlQueryCancel(Op, DAG);
28142865
}
28152866
}
28162867

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ enum NodeType : unsigned {
7979
BrxStart,
8080
BrxItem,
8181
BrxEnd,
82+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED,
83+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X,
84+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y,
85+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z,
8286
Dummy,
8387

8488
FIRST_MEMORY_OPCODE,

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7411,3 +7411,59 @@ def INT_NVVM_ST_BULK_SHARED_CTA:
74117411
"st.bulk.shared::cta [$dest_addr], $size, 0;",
74127412
[(int_nvvm_st_bulk_shared_cta addr:$dest_addr, i64:$size, (i64 0))]>,
74137413
Requires<[hasSM<100>, hasPTX<86>]>;
7414+
7415+
//
7416+
// clusterlaunchcontorl Instructions
7417+
//
7418+
7419+
def CLUSTERLAUNCHCONTRL_TRY_CANCEL:
7420+
NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
7421+
"clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 " #
7422+
"[$addr], [$mbar];",
7423+
[(int_nvvm_clusterlaunchcontrol_try_cancel_async_shared addr:$addr, addr:$mbar)]>,
7424+
Requires<[hasSM<100>, hasPTX<86>]>;
7425+
7426+
def CLUSTERLAUNCHCONTRL_TRY_CANCEL_MULTICAST:
7427+
NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
7428+
"clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
7429+
".multicast::cluster::all.b128 " #
7430+
"[$addr], [$mbar];",
7431+
[(int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared addr:$addr, addr:$mbar)]>,
7432+
Requires<[hasSM<100>, hasArchAccelFeatures, hasPTX<86>]>;
7433+
7434+
def SDTClusterLaunchControlQueryCancelIsCanceled: SDTypeProfile<1, 2, []>;
7435+
def clusterlaunchcontrol_query_cancel_is_canceled:
7436+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED",
7437+
SDTClusterLaunchControlQueryCancelIsCanceled, []>;
7438+
7439+
def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED:
7440+
NVPTXInst<(outs Int1Regs:$pred), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
7441+
"{{\n\t" #
7442+
".reg .b128 %clc_handle;\n\t" #
7443+
"mov.b128 %clc_handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
7444+
"clusterlaunchcontrol.query_cancel.is_canceled.pred.b128 $pred, %clc_handle;\n\t" #
7445+
"}}", [(set i1:$pred,
7446+
(clusterlaunchcontrol_query_cancel_is_canceled i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
7447+
Requires<[hasSM<100>, hasPTX<86>]>;
7448+
7449+
class CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<string Dim>:
7450+
NVPTXInst<(outs Int32Regs:$reg), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
7451+
"{{\n\t" #
7452+
".reg .b128 %clc_handle;\n\t" #
7453+
"mov.b128 %clc_handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
7454+
"clusterlaunchcontrol.query_cancel.get_first_ctaid::" # Dim # ".b32.b128 $reg, %clc_handle;\n\t" #
7455+
"}}", [(set i32:$reg,
7456+
(!cast<SDNode>("clusterlaunchcontrol_query_cancel_first_cta_id_" # Dim)
7457+
i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
7458+
Requires<[hasSM<100>, hasPTX<86>]>;
7459+
7460+
foreach dim = ["x", "y", "z"] in {
7461+
def SDTClusterLaunchControlQueryCancelGetFirstCtaId # dim: SDTypeProfile<1, 2, []>;
7462+
7463+
def clusterlaunchcontrol_query_cancel_first_cta_id_ # dim :
7464+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_" # !toupper(dim),
7465+
!cast<SDTypeProfile>("SDTClusterLaunchControlQueryCancelGetFirstCtaId" # dim), []>;
7466+
7467+
def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_ # dim:
7468+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<dim>;
7469+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
2+
; RUN: llc -o - -mcpu=sm_100a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
3+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
4+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 | %ptxas-verify -arch=sm_100a %}
5+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100a %}
6+
; RUN: llc -o - -mcpu=sm_101a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
7+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
8+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 | %ptxas-verify -arch=sm_101a %}
9+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_101a %}
10+
; RUN: llc -o - -mcpu=sm_120a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
11+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
12+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 | %ptxas-verify -arch=sm_120a %}
13+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_120a %}
14+
15+
define void @nvvm_clusterlaunchcontrol_try_cancel_multicast(
16+
; CHECK-PTX-SHARED64-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
17+
; CHECK-PTX-SHARED64: {
18+
; CHECK-PTX-SHARED64-NEXT: .reg .b64 %rd<3>;
19+
; CHECK-PTX-SHARED64-EMPTY:
20+
; CHECK-PTX-SHARED64-NEXT: // %bb.0:
21+
; CHECK-PTX-SHARED64-NEXT: ld.param.b64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_0];
22+
; CHECK-PTX-SHARED64-NEXT: ld.param.b64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_1];
23+
; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%rd1], [%rd2];
24+
; CHECK-PTX-SHARED64-NEXT: ret;
25+
;
26+
; CHECK-PTX-SHARED32-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
27+
; CHECK-PTX-SHARED32: {
28+
; CHECK-PTX-SHARED32-NEXT: .reg .b32 %r<3>;
29+
; CHECK-PTX-SHARED32-EMPTY:
30+
; CHECK-PTX-SHARED32-NEXT: // %bb.0:
31+
; CHECK-PTX-SHARED32-NEXT: ld.param.b32 %r1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_0];
32+
; CHECK-PTX-SHARED32-NEXT: ld.param.b32 %r2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_1];
33+
; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%r1], [%r2];
34+
; CHECK-PTX-SHARED32-NEXT: ret;
35+
ptr addrspace(3) %saddr, ptr addrspace(3) %smbar,
36+
i128 %try_cancel_response) {
37+
38+
tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %saddr, ptr addrspace(3) %smbar)
39+
ret void;
40+
}
41+
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
42+
; CHECK: {{.*}}

0 commit comments

Comments
 (0)