-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[AMDGPU] Support alloca in AS0 #136584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Support alloca in AS0 #136584
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. |
|
@llvm/pr-subscribers-backend-amdgpu Author: Shilei Tian (shiltian) ChangesThis PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast Full diff: https://github.com/llvm/llvm-project/pull/136584.diff 3 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index a37128b0d745a..a0ef7d9a7a4db 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -330,6 +330,9 @@ class AMDGPUCodeGenPrepareImpl
bool visitBitreverseIntrinsicInst(IntrinsicInst &I);
bool visitMinNum(IntrinsicInst &I);
bool visitSqrt(IntrinsicInst &I);
+
+ bool visitAllocaInst(AllocaInst &I);
+
bool run();
};
@@ -2355,6 +2358,23 @@ bool AMDGPUCodeGenPrepareImpl::visitSqrt(IntrinsicInst &Sqrt) {
return true;
}
+// Rewrite alloca with AS0 to alloca with AS5 followed by a addrspace cast.
+bool AMDGPUCodeGenPrepareImpl::visitAllocaInst(AllocaInst &I) {
+ if (I.getAddressSpace() == DL.getAllocaAddrSpace())
+ return false;
+ assert(I.getAddressSpace() == 0 && "An alloca can't be in random AS");
+ IRBuilder<> Builder(&I);
+ AllocaInst *NewAI = Builder.CreateAlloca(I.getType(), DL.getAllocaAddrSpace(),
+ I.getArraySize());
+ NewAI->takeName(&I);
+ NewAI->copyMetadata(I);
+ Value *CastI = Builder.CreateAddrSpaceCast(NewAI, I.getType(),
+ NewAI->getName() + ".cast");
+ I.replaceAllUsesWith(CastI);
+ I.eraseFromParent();
+ return true;
+}
+
bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
if (skipFunction(F))
return false;
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll
new file mode 100644
index 0000000000000..8a2b54c77ea5d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes=amdgpu-codegenprepare %s | FileCheck %s
+
+declare void @foo(ptr)
+
+define void @bar() {
+; CHECK-LABEL: define void @bar
+; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT: [[ALLOCA:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT: [[ALLOCA_CAST:%.*]] = addrspacecast ptr addrspace(5) [[ALLOCA]] to ptr
+; CHECK-NEXT: call void @foo(ptr [[ALLOCA_CAST]])
+; CHECK-NEXT: ret void
+;
+ %alloca = alloca i32, align 4
+ call void @foo(ptr %alloca)
+ ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll b/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll
deleted file mode 100644
index 1e72e679e83c0..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll
+++ /dev/null
@@ -1,16 +0,0 @@
-; RUN: not --crash llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 | FileCheck %s
-
-; The alloca has the wrong address space and is passed to a call. The
-; FrameIndex was created with the natural 32-bit pointer type instead
-; of the declared 64-bit. Make sure we don't assert.
-
-; CHECK: LLVM ERROR: Cannot select: {{.*}}: i64 = FrameIndex<0>
-
-declare void @func(ptr)
-
-define void @main() {
-bb:
- %alloca = alloca i32, align 4
- call void @func(ptr %alloca)
- ret void
-}
|
c544c2e to
37fc6aa
Compare
| bool visitMinNum(IntrinsicInst &I); | ||
| bool visitSqrt(IntrinsicInst &I); | ||
|
|
||
| bool visitAllocaInst(AllocaInst &I); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might want to run this earlier to make it more useful, although I'm a bit skeptical about how practically useful it'll be in the end. This is more like a safeguard for irregular code paths. Normal code definitely wouldn't emit this kind of alloca.
37fc6aa to
71fc11c
Compare
arsenm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong place to handle this. AMDGPUCodeGenPrepare cannot be used for lowering. The first change should be to custom lower alloca and insert the cast there.
A follow up change should handle the alloca case in getAssumedAddrSpace to get it to fold earlier
Then I wonder what Also, the NVPTX handling you mentioned in another PR is also a IR pass. I think to lower it in the middle end instead of instruction selection can take advantage of existing middle end optimization, though AMDGPUCodeGenPrepare seems to be a little bit too late. |
It's not lowering, it's hacking in a poor substitute for a nonnull flag on the instruction. It's not required.
I never said what NVPTX does is good. If you implement getAssumedAddrSpace, you'll get the casting pattern which will be cleaned up in InferAddressSpaces at an earlier point. The backend just needs direct handling of the fallback case |
22cdc02 to
ae09399
Compare
|
The logic to detect private->flat casts also needs to be updated in AMDGPUAttributor |
You mean in the |
ae09399 to
c77ed29
Compare
No. I mean in the detection of the queue pointer uses and flat scratch init. The places looking for ADDR_SPACE_CAST_PRIVATE_TO_FLAT |
d04b738 to
1f1cbe2
Compare
Done. |
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast back to AS0.
1f1cbe2 to
01e6991
Compare
|
Based on the feedback in #136865 and #135820, I think not supporting this is the more appropriate direction. I propose that we close this PR and instead enforce that This way, we avoid hitting backend errors like "cannot select frameindex"” which can be misleading and make it sound like a backend bug when it actually isn't. If we find more practical use cases in the future that require supporting AS0, we can always revisit and reopen this PR. What do you think? @arsenm CC @nikic @jdoerfert |
|
@shiltian Yes, I think we should do that. |

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast
back to AS0.