You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The ``clusterlaunchcontrol.try_cancel`` intrinsics requests atomically cancelling
1640
+
the launch of a cluster that has not started running yet. It asynchronously non-atomically writes
1641
+
a 16-byte opaque response to shared memory, pointed to by 16-byte-aligned ``addr`` indicating whether the
1642
+
operation succeeded or failed. ``addr`` and 8-byte-aligned ``mbar`` must refer to ``shared::cta``
1643
+
otherwise the behavior is undefined. The completion of the asynchronous operation
1644
+
is tracked using the mbarrier completion mechanism at ``.cluster`` scope referenced
1645
+
by the shared memory pointer, ``mbar``. On success, the opaque response contains
1646
+
the CTA id of the first CTA of the canceled cluster; no other successful response
1647
+
from other ``clusterlaunchcontrol.try_cancel`` operations from the same grid will
1648
+
contain that id.
1649
+
1650
+
The ``multicast`` variant specifies that the response is asynchronously non-atomically written to
1651
+
the corresponding shared memory location of each CTA in the requesting cluster.
1652
+
The completion of the write of each local response is tracked by independent
1653
+
mbarriers at the corresponding shared memory location of each CTA in the
1654
+
cluster.
1655
+
1656
+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-try-cancel>`__.
The intrinsic returns ``0`` (false) if the request failed. If the request succeeded,
1675
+
it returns ``1`` (true). A true result indicates that:
1676
+
1677
+
- the thread block cluster whose first CTA id matches that of the response
1678
+
handle will not run, and
1679
+
- no other successful response of another ``try_cancel`` request in the grid will contain
1680
+
the first CTA id of that cluster
1681
+
1682
+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
the coordinate of the first CTA in the canceled cluster, either x, y, or z.
1708
+
1709
+
If the request failed, the behavior of these intrinsics is undefined.
1710
+
1711
+
For more information, refer `PTX ISA <https://docs.nvidia.com/cuda/parallel-thread-execution/?a#parallel-synchronization-and-communication-instructions-clusterlaunchcontrol-query-cancel>`__.
0 commit comments