Skip to content

Conversation

@shiltian
Copy link
Contributor

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast
back to AS0.

Copy link
Contributor Author

shiltian commented Apr 21, 2025

@llvmbot
Copy link
Member

llvmbot commented Apr 21, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast
back to AS0.


Full diff: https://github.com/llvm/llvm-project/pull/136584.diff

3 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (+20)
  • (added) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll (+17)
  • (removed) llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll (-16)
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index a37128b0d745a..a0ef7d9a7a4db 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -330,6 +330,9 @@ class AMDGPUCodeGenPrepareImpl
   bool visitBitreverseIntrinsicInst(IntrinsicInst &I);
   bool visitMinNum(IntrinsicInst &I);
   bool visitSqrt(IntrinsicInst &I);
+
+  bool visitAllocaInst(AllocaInst &I);
+
   bool run();
 };
 
@@ -2355,6 +2358,23 @@ bool AMDGPUCodeGenPrepareImpl::visitSqrt(IntrinsicInst &Sqrt) {
   return true;
 }
 
+// Rewrite alloca with AS0 to alloca with AS5 followed by a addrspace cast.
+bool AMDGPUCodeGenPrepareImpl::visitAllocaInst(AllocaInst &I) {
+  if (I.getAddressSpace() == DL.getAllocaAddrSpace())
+    return false;
+  assert(I.getAddressSpace() == 0 && "An alloca can't be in random AS");
+  IRBuilder<> Builder(&I);
+  AllocaInst *NewAI = Builder.CreateAlloca(I.getType(), DL.getAllocaAddrSpace(),
+                                           I.getArraySize());
+  NewAI->takeName(&I);
+  NewAI->copyMetadata(I);
+  Value *CastI = Builder.CreateAddrSpaceCast(NewAI, I.getType(),
+                                             NewAI->getName() + ".cast");
+  I.replaceAllUsesWith(CastI);
+  I.eraseFromParent();
+  return true;
+}
+
 bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
   if (skipFunction(F))
     return false;
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll
new file mode 100644
index 0000000000000..8a2b54c77ea5d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes=amdgpu-codegenprepare %s | FileCheck %s
+
+declare void @foo(ptr)
+
+define void @bar() {
+; CHECK-LABEL: define void @bar
+; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT:    [[ALLOCA_CAST:%.*]] = addrspacecast ptr addrspace(5) [[ALLOCA]] to ptr
+; CHECK-NEXT:    call void @foo(ptr [[ALLOCA_CAST]])
+; CHECK-NEXT:    ret void
+;
+  %alloca = alloca i32, align 4
+  call void @foo(ptr %alloca)
+  ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll b/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll
deleted file mode 100644
index 1e72e679e83c0..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll
+++ /dev/null
@@ -1,16 +0,0 @@
-; RUN: not --crash llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 | FileCheck %s
-
-; The alloca has the wrong address space and is passed to a call. The
-; FrameIndex was created with the natural 32-bit pointer type instead
-; of the declared 64-bit. Make sure we don't assert.
-
-; CHECK: LLVM ERROR: Cannot select: {{.*}}: i64 = FrameIndex<0>
-
-declare void @func(ptr)
-
-define void @main() {
-bb:
-  %alloca = alloca i32, align 4
-  call void @func(ptr %alloca)
-  ret void
-}

@shiltian shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch 2 times, most recently from c544c2e to 37fc6aa Compare April 22, 2025 04:24
bool visitMinNum(IntrinsicInst &I);
bool visitSqrt(IntrinsicInst &I);

bool visitAllocaInst(AllocaInst &I);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to run this earlier to make it more useful, although I'm a bit skeptical about how practically useful it'll be in the end. This is more like a safeguard for irregular code paths. Normal code definitely wouldn't emit this kind of alloca.

@shiltian shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch from 37fc6aa to 71fc11c Compare April 22, 2025 13:41
Copy link
Contributor

@arsenm arsenm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the wrong place to handle this. AMDGPUCodeGenPrepare cannot be used for lowering. The first change should be to custom lower alloca and insert the cast there.

A follow up change should handle the alloca case in getAssumedAddrSpace to get it to fold earlier

@shiltian
Copy link
Contributor Author

shiltian commented Apr 22, 2025

This is the wrong place to handle this. AMDGPUCodeGenPrepare cannot be used for lowering.

Then I wonder what AMDGPUCodeGenPrepareImpl::visitAddrSpaceCastInst is?

Also, the NVPTX handling you mentioned in another PR is also a IR pass. I think to lower it in the middle end instead of instruction selection can take advantage of existing middle end optimization, though AMDGPUCodeGenPrepare seems to be a little bit too late.

@arsenm
Copy link
Contributor

arsenm commented Apr 22, 2025

This is the wrong place to handle this. AMDGPUCodeGenPrepare cannot be used for lowering.

Then I wonder what AMDGPUCodeGenPrepareImpl::visitAddrSpaceCastInst is?

It's not lowering, it's hacking in a poor substitute for a nonnull flag on the instruction. It's not required.

Also, the NVPTX handling you mentioned in another PR is also a IR pass. I think to lower it in the middle end instead of instruction selection can take advantage of existing middle end optimization, though AMDGPUCodeGenPrepare seems to be a little bit too late.

I never said what NVPTX does is good. If you implement getAssumedAddrSpace, you'll get the casting pattern which will be cleaned up in InferAddressSpaces at an earlier point. The backend just needs direct handling of the fallback case

@shiltian shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch 3 times, most recently from 22cdc02 to ae09399 Compare April 23, 2025 01:49
@arsenm
Copy link
Contributor

arsenm commented Apr 23, 2025

The logic to detect private->flat casts also needs to be updated in AMDGPUAttributor

@shiltian
Copy link
Contributor Author

The logic to detect private->flat casts also needs to be updated in AMDGPUAttributor

You mean in the AAAddressSpace?

@shiltian shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch from ae09399 to c77ed29 Compare April 23, 2025 13:17
@arsenm
Copy link
Contributor

arsenm commented Apr 23, 2025

You mean in the AAAddressSpace?

No. I mean in the detection of the queue pointer uses and flat scratch init. The places looking for ADDR_SPACE_CAST_PRIVATE_TO_FLAT

@shiltian shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch 2 times, most recently from d04b738 to 1f1cbe2 Compare April 23, 2025 20:48
@shiltian
Copy link
Contributor Author

You mean in the AAAddressSpace?

No. I mean in the detection of the queue pointer uses and flat scratch init. The places looking for ADDR_SPACE_CAST_PRIVATE_TO_FLAT

Done.

@github-actions
Copy link

github-actions bot commented Apr 23, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast
back to AS0.
@shiltian shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch from 1f1cbe2 to 01e6991 Compare April 23, 2025 20:51
@shiltian
Copy link
Contributor Author

shiltian commented Apr 25, 2025

Based on the feedback in #136865 and #135820, I think not supporting this is the more appropriate direction. I propose that we close this PR and instead enforce that alloca must be in AS5 in the verifier, basically reverting to what #135820 initially did.

This way, we avoid hitting backend errors like "cannot select frameindex"” which can be misleading and make it sound like a backend bug when it actually isn't. If we find more practical use cases in the future that require supporting AS0, we can always revisit and reopen this PR.

What do you think? @arsenm CC @nikic @jdoerfert

@nikic
Copy link
Contributor

nikic commented Apr 25, 2025

@shiltian Yes, I think we should do that.

@shiltian shiltian closed this Apr 25, 2025
@arsenm arsenm deleted the users/shiltian/amdgpu-alloca-as0 branch April 25, 2025 15:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants