-
Notifications
You must be signed in to change notification settings - Fork 15.4k
[AMDGPU] Make readonly noilas kernel arguments constant memory #139629
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
If a function has a kernel calling convention - that is, if it will be called externally and isn't going to be called from code that might have modified memory at some other point in the program, the local assertions guaranteed by marking a pointer argument "readonly noalias" (that is, that the memory isn't modified within the function) can be streingthened to the assumption that the memory is constant - that is, that it will remain unmodified througout the execution of the program. If this strengthening wasn't possible, it would mean that some function that the kernel calls would be monifying memory despite the only pointer to that memory being readonly, which violates the semantics of readonly. The main purpose of this is to allow setting the `invariant` metadata on loads from readonly kernel arguments during IR translation. (However, this currently doesn't work for the `amdgpu-kernel` calling convention because that calling convention does a lowering to loads from the kernel argument descriptor before translation to MIR) Fixes internal SWDEV-532314
|
@llvm/pr-subscribers-llvm-globalisel @llvm/pr-subscribers-backend-amdgpu Author: Krzysztof Drewniak (krzysz00) ChangesIf a function has a kernel calling convention - that is, if it will be called externally and isn't going to be called from code that might have modified memory at some other point in the program, the local assertions guaranteed by marking a pointer argument "readonly noalias" (that is, that the memory isn't modified within the function) can be streingthened to the assumption that the memory is constant - that is, that it will remain unmodified througout the execution of the program. If this strengthening wasn't possible, it would mean that some function that the kernel calls would be monifying memory despite the only pointer to that memory being readonly, which violates the semantics of readonly. The main purpose of this is to allow setting the Fixes internal SWDEV-532314 Full diff: https://github.com/llvm/llvm-project/pull/139629.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
index 7bcc128cb114f..dbb488aedb81b 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
@@ -11,6 +11,7 @@
#include "AMDGPUAliasAnalysis.h"
#include "AMDGPU.h"
+#include "Utils/AMDGPUBaseInfo.h"
#include "llvm/Analysis/ValueTracking.h"
#include "llvm/IR/Instructions.h"
@@ -112,5 +113,14 @@ ModRefInfo AMDGPUAAResult::getModRefInfoMask(const MemoryLocation &Loc,
AS == AMDGPUAS::CONSTANT_ADDRESS_32BIT)
return ModRefInfo::NoModRef;
+ // A `readonly noalias` function argument normally only gets a `Ref` mask.
+ // However,, if the calling convention of the function is one intended for
+ // program entry points, we know that such an argument will be invariant
+ // over the life of the program.
+ if (auto* Arg = dyn_cast<Argument>(Base)) {
+ const Function *F = Arg->getParent();
+ if (AMDGPU::isKernelCC(F) && Arg->hasNoAliasAttr() && Arg->onlyReadsMemory())
+ return ModRefInfo::NoModRef;
+ }
return ModRefInfo::ModRef;
}
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-kernel-invariant-loads.ll b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-kernel-invariant-loads.ll
new file mode 100644
index 0000000000000..cfc64857040db
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/irtranslator-kernel-invariant-loads.ll
@@ -0,0 +1,40 @@
+; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5
+; RUN: llc -global-isel -mtriple=amdgcn -stop-after=irtranslator -verify-machineinstrs %s -o - | FileCheck %s
+
+define amdgpu_cs void @load_global_is_invariant(ptr addrspace(1) readonly noalias %x, ptr addrspace(1) writeonly noalias %y) {
+ ; CHECK-LABEL: name: load_global_is_invariant
+ ; CHECK: bb.1 (%ir-block.0):
+ ; CHECK-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
+ ; CHECK-NEXT: [[MV1:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[MV]](p1) :: (invariant load (s32) from %ir.x, addrspace 1)
+ ; CHECK-NEXT: G_STORE [[LOAD]](s32), [[MV1]](p1) :: (store (s32) into %ir.y, addrspace 1)
+ ; CHECK-NEXT: S_ENDPGM 0
+ %v = load float, ptr addrspace(1) %x
+ store float %v, ptr addrspace(1) %y
+ ret void
+}
+
+define void @load_global_isnt_invariant_non_kernel(ptr addrspace(1) readonly noalias %x, ptr addrspace(1) writeonly noalias %y) {
+ ; CHECK-LABEL: name: load_global_isnt_invariant_non_kernel
+ ; CHECK: bb.1 (%ir-block.0):
+ ; CHECK-NEXT: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3
+ ; CHECK-NEXT: {{ $}}
+ ; CHECK-NEXT: [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr0
+ ; CHECK-NEXT: [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+ ; CHECK-NEXT: [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
+ ; CHECK-NEXT: [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
+ ; CHECK-NEXT: [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
+ ; CHECK-NEXT: [[MV1:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+ ; CHECK-NEXT: [[LOAD:%[0-9]+]]:_(s32) = G_LOAD [[MV]](p1) :: (load (s32) from %ir.x, addrspace 1)
+ ; CHECK-NEXT: G_STORE [[LOAD]](s32), [[MV1]](p1) :: (store (s32) into %ir.y, addrspace 1)
+ ; CHECK-NEXT: SI_RETURN
+ %v = load float, ptr addrspace(1) %x
+ store float %v, ptr addrspace(1) %y
+ ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/aa-points-to-constant-memory.ll b/llvm/test/CodeGen/AMDGPU/aa-points-to-constant-memory.ll
index 4e945951dab62..6a3da62c6610c 100644
--- a/llvm/test/CodeGen/AMDGPU/aa-points-to-constant-memory.ll
+++ b/llvm/test/CodeGen/AMDGPU/aa-points-to-constant-memory.ll
@@ -110,3 +110,19 @@ define amdgpu_kernel void @nonconst_gv_constant_as() {
store i32 0, ptr addrspace(4) @global_nonconstant_constant_as
ret void
}
+
+define amdgpu_kernel void @constant_kernel_args(ptr addrspace(1) readonly noalias inreg %x) {
+; CHECK-LABEL: @constant_kernel_args(
+; CHECK-NEXT: ret void
+;
+ store i32 0, ptr addrspace(1) %x
+ ret void
+}
+
+define amdgpu_cs void @constant_cs_args(ptr addrspace(1) readonly noalias %x) {
+; CHECK-LABEL: @constant_cs_args(
+; CHECK-NEXT: ret void
+;
+ store i32 0, ptr addrspace(1) %x
+ ret void
+}
|
You can test this locally with the following command:git-clang-format --diff HEAD~1 HEAD --extensions cpp -- llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cppView the diff from clang-format here.diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
index dbb488aed..7d95c61f6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAliasAnalysis.cpp
@@ -117,9 +117,10 @@ ModRefInfo AMDGPUAAResult::getModRefInfoMask(const MemoryLocation &Loc,
// However,, if the calling convention of the function is one intended for
// program entry points, we know that such an argument will be invariant
// over the life of the program.
- if (auto* Arg = dyn_cast<Argument>(Base)) {
+ if (auto *Arg = dyn_cast<Argument>(Base)) {
const Function *F = Arg->getParent();
- if (AMDGPU::isKernelCC(F) && Arg->hasNoAliasAttr() && Arg->onlyReadsMemory())
+ if (AMDGPU::isKernelCC(F) && Arg->hasNoAliasAttr() &&
+ Arg->onlyReadsMemory())
return ModRefInfo::NoModRef;
}
return ModRefInfo::ModRef;
|
| return ModRefInfo::NoModRef; | ||
|
|
||
| // A `readonly noalias` function argument normally only gets a `Ref` mask. | ||
| // However,, if the calling convention of the function is one intended for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo ',,'
|
I thought we did this in the distant past but it was incorrect. Code elsewhere, in another running kernel or the host could still be modifying the memory. The current langref doesn't have any explicit wording on access from other threads |
| @@ -0,0 +1,40 @@ | |||
| ; NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 5 | |||
| ; RUN: llc -global-isel -mtriple=amdgcn -stop-after=irtranslator -verify-machineinstrs %s -o - | FileCheck %s | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ; RUN: llc -global-isel -mtriple=amdgcn -stop-after=irtranslator -verify-machineinstrs %s -o - | FileCheck %s | |
| ; RUN: llc -global-isel -mtriple=amdgcn -stop-after=irtranslator %s -o - | FileCheck %s |
Yes, const only guaranties it is not modified in this function. Arguably one should use atomics for cross-thread communication though. |
Yes, you need to use atomics. But that doesn't change the readonlyness of the pointer. Atomic load is still just a load |
|
If it helps, this is part of letting buffer_load become s_buffer_load, where IIRC there were concerns about cache conference with loads that weren't from invariant memory |
|
It does not help at all. It makes the matter worse. This really connects a high level concept of const memory POINTER to a low level concept of cache synthesization. Skipping the memory model.
|
Typo in the PR heading - s/noilas/noalias |
|
Closing per previous concern about correctness issues. However, what we probably should do is generalize https://github.com/llvm/llvm-project/blob/23a674d2ecc428a96d28c9772cc5178eaf763863/llvm/lib/Target/NVPTX/NVPTXTagInvariantLoads.cpp because we have the same behavior |
If a function has a kernel calling convention - that is, if it will be called externally and isn't going to be called from code that might have modified memory at some other point in the program, the local assertions guaranteed by marking a pointer argument "readonly noalias" (that is, that the memory isn't modified within the function) can be streingthened to the assumption that the memory is constant - that is, that it will remain unmodified througout the execution of the program.
If this strengthening wasn't possible, it would mean that some function that the kernel calls would be monifying memory despite the only pointer to that memory being readonly, which violates the semantics of readonly.
The main purpose of this is to allow setting the
invariantmetadata on loads from readonly kernel arguments during IRtranslation. (However, this currently doesn't work for the
amdgpu-kernelcalling convention because that calling convention does a lowering to loads from the kernel argument descriptor before translation to MIR)Fixes internal SWDEV-532314