[NVPTX] Add Intrinsics for discard.* #128404

abhilash1910 · 2025-02-23T07:03:19Z

[NVPTX] Add Intrinsics for discard.*
This PR adds intrinsics for all variations of discard.*

These intrinsics supports generic or global for all variations.
The lowering is handled from nvvm to nvptx tablegen directly.
Lit tests are added as part of discard.ll
The generated PTX is verified with a 12.3 ptxas executable.
Added docs for these intrinsics in NVPTXUsage.rst.

For more information, refer to the PTX ISA
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard_.

llvmbot · 2025-02-23T07:03:50Z

@llvm/pr-subscribers-llvm-ir

@llvm/pr-subscribers-backend-nvptx

Author: Abhilash Majumder (abhilash1910)

Changes

[NVPTX] Add Intrinsics for discard.*
This PR adds intrinsics for all variations of discard.*

These intrinsics supports generic or global for all variations.
The lowering is handled from nvvm to nvptx tablegen directly.
Lit tests are added as part of discard.ll
The generated PTX is verified with a 12.3 ptxas executable.
Added docs for these intrinsics in NVPTXUsage.rst.

For more information, refer to the PTX ISA
<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard>_.

Full diff: https://github.com/llvm/llvm-project/pull/128404.diff

4 Files Affected:

(modified) llvm/docs/NVPTXUsage.rst (+25)
(modified) llvm/include/llvm/IR/IntrinsicsNVVM.td (+9)
(modified) llvm/lib/Target/NVPTX/NVPTXIntrinsics.td (+11)
(added) llvm/test/CodeGen/NVPTX/discard.ll (+35)

diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 675b458c41e7b..d905d9b56fe29 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -630,6 +630,31 @@ uses and eviction priority which can be accessed by the '``.level::eviction_prio
 For more information, refer to the PTX ISA
 `<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu>`_.
 
+``llvm.nvvm.discard.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+  declare void  @llvm.nvvm.discard.global.L2(ptr addrspace(1) %global_ptr, i64 %size)
+  declare void  @llvm.nvvm.discard.L2(ptr %ptr, i64 %size)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.discard.*``'  invalidates the data at the address range [a .. a + (size - 1)] 
+in the cache level specified by the .level qualifier without writing back the data 
+in the cache to the memory. The operand size is an integer constant that specifies the amount of data, 
+in bytes, in the cache level specified by the .level qualifier to be discarded. The only supported value 
+for the size operand is 128. If no state space is specified then Generic Addressing is used. 
+If the specified address does not fall within the address window of .global state space then 
+the behavior is undefined.
+
+For more information, refer to the PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard>`_.
+
 '``llvm.nvvm.cp.async.bulk.tensor.g2s.tile.[1-5]d``'
 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
 
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index c32bf0318b5d6..00613eb7d2d17 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5043,6 +5043,15 @@ def int_nvvm_prefetch_global_L2_evict_last: Intrinsic<[], [llvm_global_ptr_ty],
 def int_nvvm_prefetchu_L1 : Intrinsic<[], [llvm_ptr_ty],
   [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>]>;
 
+// Intrinsics for discard
+def int_nvvm_discard_global_L2 : DefaultAttrsIntrinsic<[], 
+    [llvm_global_ptr_ty, llvm_i64_ty], [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, 
+    NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<1>>]>;
+
+def int_nvvm_discard_L2 : DefaultAttrsIntrinsic<[], 
+    [llvm_ptr_ty, llvm_i64_ty], [IntrArgMemOnly, ReadOnly<ArgIndex<0>>, 
+    NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<1>>]>;
+
 
 // Intrinsics for Bulk Copy using TMA (non-tensor)
 // From Global to Shared Cluster
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index acb9fc9867b0f..fd93ce312c9db 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -789,6 +789,17 @@ def PREFETCH_GLOBAL_L2_EVICT_LAST   : NVPTXInst<(outs), (ins Int64Regs:$addr),
 
 def PREFETCHU_L1 : PREFETCH_INTRS<"prefetchu.L1">;
 
+//Discard Intrinsics
+class DISCARD_L2_INTRS<string Addr> :
+          NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$size),
+          StrJoin<".", ["discard", Addr , "L2"]>.ret # " [$addr], $size;",
+          [(!cast<Intrinsic>(StrJoin<"_", ["int_nvvm_discard", Addr , "L2"]>.ret)
+          i64:$addr, i64:$size)]>,
+          Requires<[hasPTX<74>, hasSM<80>]>;
+
+def DISCARD_L2        : DISCARD_L2_INTRS<"">;
+def DISCARD_GLOBAL_L2 : DISCARD_L2_INTRS<"global">;
+
 //-----------------------------------
 // MBarrier Functions
 //-----------------------------------
diff --git a/llvm/test/CodeGen/NVPTX/discard.ll b/llvm/test/CodeGen/NVPTX/discard.ll
new file mode 100644
index 0000000000000..80217807765d0
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/discard.ll
@@ -0,0 +1,35 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| FileCheck --check-prefixes=CHECK-PTX64 %s
+; RUN: %if ptxas-11.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+declare void  @llvm.nvvm.discard.global.L2(ptr addrspace(1) %global_ptr, i64 %size)
+declare void  @llvm.nvvm.discard.L2(ptr %ptr, i64 %size)
+
+define void @discard_global_L2(ptr addrspace(1) %global_ptr, i64 %size) {
+; CHECK-PTX64-LABEL: discard_global_L2(
+; CHECK-PTX64:       {
+; CHECK-PTX64-NEXT:    .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT:  // %bb.0:
+; CHECK-PTX64-NEXT:    ld.param.u64 %rd1, [discard_global_L2_param_0];
+; CHECK-PTX64-NEXT:    discard.global.L2 [%rd1], 128;
+; CHECK-PTX64-NEXT:    ret;
+  tail call void @llvm.nvvm.discard.global.L2(ptr addrspace(1) %global_ptr, i64 128)
+  ret void
+}
+
+define void @discard_L2(ptr %ptr, i64 %size) {
+; CHECK-PTX64-LABEL: discard_L2(
+; CHECK-PTX64:       {
+; CHECK-PTX64-NEXT:    .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT:  // %bb.0:
+; CHECK-PTX64-NEXT:    ld.param.u64 %rd1, [discard_L2_param_0];
+; CHECK-PTX64-NEXT:    discard.L2 [%rd1], 128;
+; CHECK-PTX64-NEXT:    ret;
+  tail call void @llvm.nvvm.discard.L2(ptr %ptr, i64 128)
+  ret void
+}
+

llvm/docs/NVPTXUsage.rst

llvm/test/CodeGen/NVPTX/discard.ll

durga4github

LGTM

llvm/include/llvm/IR/IntrinsicsNVVM.td

gonzalobg

The documentation is still incorrect for LLVM and needs work.

llvm/docs/NVPTXUsage.rst

Co-authored-by: gonzalobg <[email protected]>

gonzalobg

LGTM

llvm/docs/NVPTXUsage.rst

gonzalobg · 2025-02-28T18:19:58Z

The pointer needs to be 128-byte aligned (so naturally-aligned). Thanks!

Artem-B

LGTM modulo documenting 128b alignment requirement.

abhilash1910 · 2025-03-03T16:39:58Z

@durga4github could you please help to merge? Thanks.

add discard intrinsics

e47430f

llvmbot added backend:NVPTX llvm:ir labels Feb 23, 2025

gonzalobg reviewed Feb 23, 2025

View reviewed changes

llvm/docs/NVPTXUsage.rst Outdated Show resolved Hide resolved

refine fix size by 128

60a6bed

durga4github reviewed Feb 24, 2025

View reviewed changes

llvm/test/CodeGen/NVPTX/discard.ll Outdated Show resolved Hide resolved

durga4github approved these changes Feb 24, 2025

View reviewed changes

durga4github requested a review from Artem-B February 24, 2025 07:44

refine tests

e320b70

Artem-B reviewed Feb 24, 2025

View reviewed changes

llvm/include/llvm/IR/IntrinsicsNVVM.td Outdated Show resolved Hide resolved

abhilash1910 and others added 6 commits February 25, 2025 13:24

refine

f745ad8

rebase

0594edf

Merge branch 'main' into discard_nvptx

106c7bb

refine docs

b96a256

refine nvvm

8b8c135

refine docs

68540fd

gonzalobg suggested changes Feb 28, 2025

View reviewed changes

llvm/docs/NVPTXUsage.rst Outdated Show resolved Hide resolved

refine docs

40bf9da

gonzalobg reviewed Feb 28, 2025

View reviewed changes

llvm/docs/NVPTXUsage.rst Outdated Show resolved Hide resolved

abhilash1910 and others added 2 commits February 28, 2025 21:58

Merge from suggestion llvm/docs/NVPTXUsage.rst

76128ce

Co-authored-by: gonzalobg <[email protected]>

refine docs

82a5a75

gonzalobg approved these changes Feb 28, 2025

View reviewed changes

Artem-B reviewed Feb 28, 2025

View reviewed changes

llvm/docs/NVPTXUsage.rst Show resolved Hide resolved

Artem-B approved these changes Feb 28, 2025

View reviewed changes

abhilash1910 added 2 commits March 1, 2025 10:46

refine docs

e74662b

refine docs

17642b9

durga4github merged commit 7c58089 into llvm:main Mar 3, 2025
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[NVPTX] Add Intrinsics for discard.* #128404

[NVPTX] Add Intrinsics for discard.* #128404

Uh oh!

abhilash1910 commented Feb 23, 2025

Uh oh!

llvmbot commented Feb 23, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

durga4github left a comment

Uh oh!

Uh oh!

gonzalobg left a comment

Uh oh!

Uh oh!

Uh oh!

gonzalobg left a comment

Uh oh!

Uh oh!

gonzalobg commented Feb 28, 2025 via email •

edited

Loading

Uh oh!

Artem-B left a comment

Uh oh!

abhilash1910 commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

[NVPTX] Add Intrinsics for discard.* #128404

[NVPTX] Add Intrinsics for discard.* #128404

Uh oh!

Conversation

abhilash1910 commented Feb 23, 2025

Uh oh!

llvmbot commented Feb 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

durga4github left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gonzalobg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

gonzalobg left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

gonzalobg commented Feb 28, 2025 via email • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Artem-B left a comment

Choose a reason for hiding this comment

Uh oh!

abhilash1910 commented Mar 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

llvmbot commented Feb 23, 2025 •

edited

Loading

gonzalobg commented Feb 28, 2025 via email •

edited

Loading