-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[NVPTX] Add Intrinsics for discard.* #128404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-llvm-ir @llvm/pr-subscribers-backend-nvptx Author: Abhilash Majumder (abhilash1910) Changes[NVPTX] Add Intrinsics for discard.*
For more information, refer to the PTX ISA Full diff: https://github.com/llvm/llvm-project/pull/128404.diff 4 Files Affected:
diff --git a/llvm/docs/NVPTXUsage.rst b/llvm/docs/NVPTXUsage.rst
index 675b458c41e7b..d905d9b56fe29 100644
--- a/llvm/docs/NVPTXUsage.rst
+++ b/llvm/docs/NVPTXUsage.rst
@@ -630,6 +630,31 @@ uses and eviction priority which can be accessed by the '``.level::eviction_prio
For more information, refer to the PTX ISA
`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-prefetch-prefetchu>`_.
+``llvm.nvvm.discard.*``'
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+Syntax:
+"""""""
+
+.. code-block:: llvm
+
+ declare void @llvm.nvvm.discard.global.L2(ptr addrspace(1) %global_ptr, i64 %size)
+ declare void @llvm.nvvm.discard.L2(ptr %ptr, i64 %size)
+
+Overview:
+"""""""""
+
+The '``@llvm.nvvm.discard.*``' invalidates the data at the address range [a .. a + (size - 1)]
+in the cache level specified by the .level qualifier without writing back the data
+in the cache to the memory. The operand size is an integer constant that specifies the amount of data,
+in bytes, in the cache level specified by the .level qualifier to be discarded. The only supported value
+for the size operand is 128. If no state space is specified then Generic Addressing is used.
+If the specified address does not fall within the address window of .global state space then
+the behavior is undefined.
+
+For more information, refer to the PTX ISA
+`<https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard>`_.
+
'``llvm.nvvm.cp.async.bulk.tensor.g2s.tile.[1-5]d``'
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
diff --git a/llvm/include/llvm/IR/IntrinsicsNVVM.td b/llvm/include/llvm/IR/IntrinsicsNVVM.td
index c32bf0318b5d6..00613eb7d2d17 100644
--- a/llvm/include/llvm/IR/IntrinsicsNVVM.td
+++ b/llvm/include/llvm/IR/IntrinsicsNVVM.td
@@ -5043,6 +5043,15 @@ def int_nvvm_prefetch_global_L2_evict_last: Intrinsic<[], [llvm_global_ptr_ty],
def int_nvvm_prefetchu_L1 : Intrinsic<[], [llvm_ptr_ty],
[IntrArgMemOnly, ReadOnly<ArgIndex<0>>, NoCapture<ArgIndex<0>>]>;
+// Intrinsics for discard
+def int_nvvm_discard_global_L2 : DefaultAttrsIntrinsic<[],
+ [llvm_global_ptr_ty, llvm_i64_ty], [IntrArgMemOnly, ReadOnly<ArgIndex<0>>,
+ NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<1>>]>;
+
+def int_nvvm_discard_L2 : DefaultAttrsIntrinsic<[],
+ [llvm_ptr_ty, llvm_i64_ty], [IntrArgMemOnly, ReadOnly<ArgIndex<0>>,
+ NoCapture<ArgIndex<0>>, ImmArg<ArgIndex<1>>]>;
+
// Intrinsics for Bulk Copy using TMA (non-tensor)
// From Global to Shared Cluster
diff --git a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
index acb9fc9867b0f..fd93ce312c9db 100644
--- a/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
+++ b/llvm/lib/Target/NVPTX/NVPTXIntrinsics.td
@@ -789,6 +789,17 @@ def PREFETCH_GLOBAL_L2_EVICT_LAST : NVPTXInst<(outs), (ins Int64Regs:$addr),
def PREFETCHU_L1 : PREFETCH_INTRS<"prefetchu.L1">;
+//Discard Intrinsics
+class DISCARD_L2_INTRS<string Addr> :
+ NVPTXInst<(outs), (ins Int64Regs:$addr, Int64Regs:$size),
+ StrJoin<".", ["discard", Addr , "L2"]>.ret # " [$addr], $size;",
+ [(!cast<Intrinsic>(StrJoin<"_", ["int_nvvm_discard", Addr , "L2"]>.ret)
+ i64:$addr, i64:$size)]>,
+ Requires<[hasPTX<74>, hasSM<80>]>;
+
+def DISCARD_L2 : DISCARD_L2_INTRS<"">;
+def DISCARD_GLOBAL_L2 : DISCARD_L2_INTRS<"global">;
+
//-----------------------------------
// MBarrier Functions
//-----------------------------------
diff --git a/llvm/test/CodeGen/NVPTX/discard.ll b/llvm/test/CodeGen/NVPTX/discard.ll
new file mode 100644
index 0000000000000..80217807765d0
--- /dev/null
+++ b/llvm/test/CodeGen/NVPTX/discard.ll
@@ -0,0 +1,35 @@
+; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py UTC_ARGS: --version 5
+; RUN: llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| FileCheck --check-prefixes=CHECK-PTX64 %s
+; RUN: %if ptxas-11.4 %{ llc < %s -mtriple=nvptx64 -mcpu=sm_80 -mattr=+ptx74| %ptxas-verify -arch=sm_80 %}
+
+target triple = "nvptx64-nvidia-cuda"
+
+declare void @llvm.nvvm.discard.global.L2(ptr addrspace(1) %global_ptr, i64 %size)
+declare void @llvm.nvvm.discard.L2(ptr %ptr, i64 %size)
+
+define void @discard_global_L2(ptr addrspace(1) %global_ptr, i64 %size) {
+; CHECK-PTX64-LABEL: discard_global_L2(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.u64 %rd1, [discard_global_L2_param_0];
+; CHECK-PTX64-NEXT: discard.global.L2 [%rd1], 128;
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.discard.global.L2(ptr addrspace(1) %global_ptr, i64 128)
+ ret void
+}
+
+define void @discard_L2(ptr %ptr, i64 %size) {
+; CHECK-PTX64-LABEL: discard_L2(
+; CHECK-PTX64: {
+; CHECK-PTX64-NEXT: .reg .b64 %rd<2>;
+; CHECK-PTX64-EMPTY:
+; CHECK-PTX64-NEXT: // %bb.0:
+; CHECK-PTX64-NEXT: ld.param.u64 %rd1, [discard_L2_param_0];
+; CHECK-PTX64-NEXT: discard.L2 [%rd1], 128;
+; CHECK-PTX64-NEXT: ret;
+ tail call void @llvm.nvvm.discard.L2(ptr %ptr, i64 128)
+ ret void
+}
+
|
durga4github
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
gonzalobg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation is still incorrect for LLVM and needs work.
Co-authored-by: gonzalobg <[email protected]>
gonzalobg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
The pointer needs to be 128-byte aligned (so naturally-aligned). Thanks!
|
Artem-B
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo documenting 128b alignment requirement.
|
@durga4github could you please help to merge? Thanks. |
[NVPTX] Add Intrinsics for discard.*
This PR adds intrinsics for all variations of discard.*
For more information, refer to the PTX ISA
https://docs.nvidia.com/cuda/parallel-thread-execution/#data-movement-and-conversion-instructions-discard_.