Skip to content

Commit c02763b

Browse files
[LLVM][NVPTX] Add NVPTX codegen support for clusterlaunchcontrol instruction
This commit adds NVPTX codegen support for clusterlaunchcontrol instructions with tests under clusterlaunchcontrol.ll and clusterlaunchcontrol-multicast.ll. For more information, Please refer [PTX ISA](https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel)
1 parent fb07683 commit c02763b

File tree

7 files changed

+383
-0
lines changed

7 files changed

+383
-0
lines changed

llvm/docs/NVPTXUsage.rst

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1614,6 +1614,98 @@ similar but the latter uses generic addressing (see `Generic Addressing <https:/
16141614

16151615
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-st-bulk>`__.
16161616

1617+
1618+
clusterlaunchcontrol Intrinsics
1619+
-------------------------------
1620+
1621+
'``llvm.nvvm.clusterlaunchcontrol.try_cancel*``' Intrinsics
1622+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1623+
1624+
Syntax:
1625+
"""""""
1626+
1627+
.. code-block:: llvm
1628+
1629+
declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
1630+
declare void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %addr, ptr addrspace(3) %mbar)
1631+
1632+
Overview:
1633+
"""""""""
1634+
1635+
The ``clusterlaunchcontrol.try_cancel`` intrinsics requests atomically cancelling
1636+
the launch of a cluster that has not started running yet. It asynchronously non-atomically writes
1637+
a 16-byte opaque response to shared memory, pointed to by 16-byte-aligned ``addr`` indicating whether the
1638+
operation succeeded or failed. ``addr`` and 8-byte-aligned ``mbar`` must refer to ``shared::cta``
1639+
otherwise the behavior is undefined. The completion of the asynchronous operation
1640+
is tracked using the mbarrier completion mechanism at ``.cluster`` scope referenced
1641+
by the shared memory pointer, ``mbar``. On success, the opaque response contains
1642+
the CTA id of the first CTA of the canceled cluster; no other successful response
1643+
from other ``clusterlaunchcontrol.try_cancel`` operations from the same grid will
1644+
contain that id.
1645+
1646+
The ``multicast`` variant specifies that the response is asynchronously non-atomically written to
1647+
the corresponding shared memory location of each CTA in the requesting cluster.
1648+
The completion of the write of each local response is tracked by independent
1649+
mbarriers at the corresponding shared memory location of each CTA in the
1650+
cluster.
1651+
1652+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel>`__.
1653+
1654+
'``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled``' Intrinsic
1655+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1656+
1657+
Syntax:
1658+
"""""""
1659+
1660+
.. code-block:: llvm
1661+
1662+
declare i1 @llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled(i128 %try_cancel_response)
1663+
1664+
Overview:
1665+
"""""""""
1666+
1667+
The ``llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled`` intrinsic decodes the opaque response written by the
1668+
``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
1669+
1670+
The intrinsic returns ``0`` (false) if the request failed. If the request succeeded,
1671+
it returns ``1`` (true). A true result indicates that:
1672+
1673+
- the thread block cluster whose first CTA id matches that of the response
1674+
handle will not run, and
1675+
- no other successful response of another ``try_cancel`` request in the grid will contain
1676+
the first CTA id of that cluster
1677+
1678+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
1679+
1680+
1681+
'``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.*``' Intrinsics
1682+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1683+
1684+
Syntax:
1685+
"""""""
1686+
1687+
.. code-block:: llvm
1688+
1689+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.x(i128 %try_cancel_response)
1690+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.y(i128 %try_cancel_response)
1691+
declare i32 @llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.z(i128 %try_cancel_response)
1692+
1693+
Overview:
1694+
"""""""""
1695+
1696+
The ``clusterlaunchcontrol.query_cancel.get_first_ctaid.*`` intrinsic can be
1697+
used to decode the successful opaque response written by the
1698+
``llvm.nvvm.clusterlaunchcontrol.try_cancel`` operation.
1699+
1700+
If the request succeeded:
1701+
1702+
- ``llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid.{x,y,z}`` returns
1703+
the coordinate of the first CTA in the canceled cluster, either x, y, or z.
1704+
1705+
If the request failed, the behavior of these intrinsics is undefined.
1706+
1707+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
1708+
16171709
Other Intrinsics
16181710
----------------
16191711

llvm/include/llvm/IR/IntrinsicsNVVM.td

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3996,4 +3996,32 @@ def int_nvvm_st_bulk_shared_cta : DefaultAttrsIntrinsic<[],
39963996
[IntrArgMemOnly, IntrWriteMem,
39973997
WriteOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<2>>]>;
39983998

3999+
//
4000+
// clusterlaunchcontorl Intrinsics
4001+
//
4002+
4003+
// clusterlaunchcontrol.try_cancel
4004+
4005+
def int_nvvm_clusterlaunchcontrol_try_cancel_async_shared
4006+
: DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
4007+
[IntrHasSideEffects, IntrArgMemOnly],
4008+
"llvm.nvvm.clusterlaunchcontrol.try_cancel.async.shared">;
4009+
4010+
def int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared
4011+
: DefaultAttrsIntrinsic<[], [llvm_shared_ptr_ty, llvm_shared_ptr_ty],
4012+
[IntrHasSideEffects, IntrArgMemOnly],
4013+
"llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared">;
4014+
4015+
// clusterlaunchcontrol.query_cancel.is_canceled
4016+
4017+
def int_nvvm_clusterlaunchcontrol_query_cancel_is_canceled
4018+
: DefaultAttrsIntrinsic<[llvm_i1_ty], [llvm_i128_ty], [IntrNoMem, IntrSpeculatable],
4019+
"llvm.nvvm.clusterlaunchcontrol.query_cancel.is_canceled">;
4020+
4021+
foreach dim = ["x", "y", "z"] in {
4022+
def int_nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_ # dim
4023+
: DefaultAttrsIntrinsic<[llvm_i32_ty], [llvm_i128_ty], [IntrNoMem, IntrSpeculatable],
4024+
"llvm.nvvm.clusterlaunchcontrol.query_cancel.get_first_ctaid." # dim>;
4025+
}
4026+
39994027
} // let TargetPrefix = "nvvm"

llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1041,6 +1041,8 @@ NVPTXTargetLowering::NVPTXTargetLowering(const NVPTXTargetMachine &TM,
10411041
Custom);
10421042

10431043
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::Other, Custom);
1044+
// Enable custom lowering for the i128 bit operand with clusterlaunchcontrol
1045+
setOperationAction(ISD::INTRINSIC_WO_CHAIN, MVT::i128, Custom);
10441046
}
10451047

10461048
const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
@@ -1119,6 +1121,10 @@ const char *NVPTXTargetLowering::getTargetNodeName(unsigned Opcode) const {
11191121
MAKE_CASE(NVPTXISD::BrxEnd)
11201122
MAKE_CASE(NVPTXISD::BrxItem)
11211123
MAKE_CASE(NVPTXISD::BrxStart)
1124+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED)
1125+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X)
1126+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y)
1127+
MAKE_CASE(NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z)
11221128
}
11231129
return nullptr;
11241130

@@ -2795,12 +2801,56 @@ static SDValue LowerIntrinsicVoid(SDValue Op, SelectionDAG &DAG) {
27952801
return Op;
27962802
}
27972803

2804+
static SDValue LowerClusterLaunchControl(SDValue Op, SelectionDAG &DAG) {
2805+
2806+
SDNode *N = Op.getNode();
2807+
if (N->getOperand(1).getValueType() != MVT::i128) {
2808+
// return, if the operand is already lowered
2809+
return SDValue();
2810+
}
2811+
2812+
unsigned IID =
2813+
cast<ConstantSDNode>(N->getOperand(0).getNode())->getZExtValue();
2814+
auto Opcode = [&]() {
2815+
switch (IID) {
2816+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
2817+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED;
2818+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
2819+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X;
2820+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
2821+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y;
2822+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
2823+
return NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z;
2824+
default:
2825+
llvm_unreachable("unsupported/unhandled intrinsic");
2826+
}
2827+
}();
2828+
2829+
SDLoc DL(N);
2830+
SDValue TryCancelResponse = N->getOperand(1);
2831+
SDValue Cast = DAG.getNode(ISD::BITCAST, DL, MVT::v2i64, TryCancelResponse);
2832+
SDValue TryCancelResponse0 =
2833+
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
2834+
DAG.getIntPtrConstant(0, DL));
2835+
SDValue TryCancelResponse1 =
2836+
DAG.getNode(ISD::EXTRACT_VECTOR_ELT, DL, MVT::i64, Cast,
2837+
DAG.getIntPtrConstant(1, DL));
2838+
2839+
return DAG.getNode(Opcode, DL, N->getVTList(),
2840+
{TryCancelResponse0, TryCancelResponse1});
2841+
}
2842+
27982843
static SDValue lowerIntrinsicWOChain(SDValue Op, SelectionDAG &DAG) {
27992844
switch (Op->getConstantOperandVal(0)) {
28002845
default:
28012846
return Op;
28022847
case Intrinsic::nvvm_internal_addrspace_wrap:
28032848
return Op.getOperand(1);
2849+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_is_canceled:
2850+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_x:
2851+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_y:
2852+
case Intrinsic::nvvm_clusterlaunchcontrol_query_cancel_get_first_ctaid_z:
2853+
return LowerClusterLaunchControl(Op, DAG);
28042854
}
28052855
}
28062856

llvm/lib/Target/NVPTX/NVPTXISelLowering.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -79,6 +79,10 @@ enum NodeType : unsigned {
7979
BrxStart,
8080
BrxItem,
8181
BrxEnd,
82+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED,
83+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_X,
84+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Y,
85+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_Z,
8286
Dummy,
8387

8488
FIRST_MEMORY_OPCODE,

llvm/lib/Target/NVPTX/NVPTXIntrinsics.td

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7405,3 +7405,59 @@ def INT_NVVM_ST_BULK_SHARED_CTA:
74057405
"st.bulk.shared::cta [$dest_addr], $size, 0;",
74067406
[(int_nvvm_st_bulk_shared_cta addr:$dest_addr, i64:$size, (i64 0))]>,
74077407
Requires<[hasSM<100>, hasPTX<86>]>;
7408+
7409+
//
7410+
// clusterlaunchcontorl Instructions
7411+
//
7412+
7413+
def CLUSTERLAUNCHCONTRL_TRY_CANCEL:
7414+
NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
7415+
"clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.b128 " #
7416+
"[$addr], [$mbar];",
7417+
[(int_nvvm_clusterlaunchcontrol_try_cancel_async_shared addr:$addr, addr:$mbar)]>,
7418+
Requires<[hasSM<100>, hasPTX<86>]>;
7419+
7420+
def CLUSTERLAUNCHCONTRL_TRY_CANCEL_MULTICAST:
7421+
NVPTXInst<(outs), (ins ADDR:$addr, ADDR:$mbar),
7422+
"clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes" #
7423+
".multicast::cluster::all.b128 " #
7424+
"[$addr], [$mbar];",
7425+
[(int_nvvm_clusterlaunchcontrol_try_cancel_async_multicast_shared addr:$addr, addr:$mbar)]>,
7426+
Requires<[hasSM<100>, hasArchAccelFeatures, hasPTX<86>]>;
7427+
7428+
def SDTClusterLaunchControlQueryCancelIsCanceled: SDTypeProfile<1, 2, []>;
7429+
def clusterlaunchcontrol_query_cancel_is_canceled:
7430+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED",
7431+
SDTClusterLaunchControlQueryCancelIsCanceled, []>;
7432+
7433+
def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_IS_CANCELED:
7434+
NVPTXInst<(outs Int1Regs:$pred), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
7435+
"{{\n\t" #
7436+
".reg .b128 %handle;\n\t" #
7437+
"mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
7438+
"clusterlaunchcontrol.query_cancel.is_canceled.pred.b128 $pred, %handle;\n\t" #
7439+
"}}", [(set i1:$pred,
7440+
(clusterlaunchcontrol_query_cancel_is_canceled i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
7441+
Requires<[hasSM<100>, hasPTX<86>]>;
7442+
7443+
class CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<string Dim>:
7444+
NVPTXInst<(outs Int32Regs:$reg), (ins Int64Regs:$try_cancel_response0, Int64Regs:$try_cancel_response1),
7445+
"{{\n\t" #
7446+
".reg .b128 %handle;\n\t" #
7447+
"mov.b128 %handle, {$try_cancel_response0, $try_cancel_response1};\n\t" #
7448+
"clusterlaunchcontrol.query_cancel.get_first_ctaid::" # Dim # ".b32.b128 $reg, %handle;\n\t" #
7449+
"}}", [(set i32:$reg,
7450+
(!cast<SDNode>("clusterlaunchcontrol_query_cancel_first_cta_id_" # Dim)
7451+
i64:$try_cancel_response0, i64:$try_cancel_response1))]>,
7452+
Requires<[hasSM<100>, hasPTX<86>]>;
7453+
7454+
foreach dim = ["x", "y", "z"] in {
7455+
def SDTClusterLaunchControlQueryCancelGetFirstCtaId # dim: SDTypeProfile<1, 2, []>;
7456+
7457+
def clusterlaunchcontrol_query_cancel_first_cta_id_ # dim :
7458+
SDNode<"NVPTXISD::CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_" # !toupper(dim),
7459+
!cast<SDTypeProfile>("SDTClusterLaunchControlQueryCancelGetFirstCtaId" # dim), []>;
7460+
7461+
def CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID_ # dim:
7462+
CLUSTERLAUNCHCONTROL_QUERY_CANCEL_GET_FIRST_CTAID<dim>;
7463+
}
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
2+
; RUN: llc -o - -mcpu=sm_100a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
3+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
4+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 | %ptxas-verify -arch=sm_100a %}
5+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_100a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_100a %}
6+
; RUN: llc -o - -mcpu=sm_101a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
7+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
8+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 | %ptxas-verify -arch=sm_101a %}
9+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_101a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_101a %}
10+
; RUN: llc -o - -mcpu=sm_120a -march=nvptx64 -mattr=+ptx86 %s | FileCheck %s --check-prefixes=CHECK,CHECK-PTX-SHARED64
11+
; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | FileCheck --check-prefixes=CHECK,CHECK-PTX-SHARED32 %s
12+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 | %ptxas-verify -arch=sm_120a %}
13+
; RUN: %if ptxas-12.8 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_120a -mattr=+ptx86 --nvptx-short-ptr | %ptxas-verify -arch=sm_120a %}
14+
15+
define void @nvvm_clusterlaunchcontrol_try_cancel_multicast(ptr %addr, ptr %mbar,
16+
; CHECK-PTX-SHARED64-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
17+
; CHECK-PTX-SHARED64: {
18+
; CHECK-PTX-SHARED64-NEXT: .reg .b64 %rd<3>;
19+
; CHECK-PTX-SHARED64-EMPTY:
20+
; CHECK-PTX-SHARED64-NEXT: // %bb.0:
21+
; CHECK-PTX-SHARED64-NEXT: ld.param.b64 %rd1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
22+
; CHECK-PTX-SHARED64-NEXT: ld.param.b64 %rd2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
23+
; CHECK-PTX-SHARED64-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%rd1], [%rd2];
24+
; CHECK-PTX-SHARED64-NEXT: ret;
25+
;
26+
; CHECK-PTX-SHARED32-LABEL: nvvm_clusterlaunchcontrol_try_cancel_multicast(
27+
; CHECK-PTX-SHARED32: {
28+
; CHECK-PTX-SHARED32-NEXT: .reg .b32 %r<3>;
29+
; CHECK-PTX-SHARED32-EMPTY:
30+
; CHECK-PTX-SHARED32-NEXT: // %bb.0:
31+
; CHECK-PTX-SHARED32-NEXT: ld.param.b32 %r1, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_2];
32+
; CHECK-PTX-SHARED32-NEXT: ld.param.b32 %r2, [nvvm_clusterlaunchcontrol_try_cancel_multicast_param_3];
33+
; CHECK-PTX-SHARED32-NEXT: clusterlaunchcontrol.try_cancel.async.shared::cta.mbarrier::complete_tx::bytes.multicast::cluster::all.b128 [%r1], [%r2];
34+
; CHECK-PTX-SHARED32-NEXT: ret;
35+
ptr addrspace(3) %saddr, ptr addrspace(3) %smbar,
36+
i128 %try_cancel_response) {
37+
38+
tail call void @llvm.nvvm.clusterlaunchcontrol.try_cancel.async.multicast.shared(ptr addrspace(3) %saddr, ptr addrspace(3) %smbar)
39+
ret void;
40+
}
41+
;; NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
42+
; CHECK: {{.*}}

0 commit comments

Comments
 (0)