[AMDGPU] Support alloca in AS0 #136584

shiltian · 2025-04-21T17:30:38Z

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast
back to AS0.

shiltian · 2025-04-21T17:30:58Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

llvmbot · 2025-04-21T17:31:13Z

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast
back to AS0.

Full diff: https://github.com/llvm/llvm-project/pull/136584.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp (+20)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll (+17)
(removed) llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll (-16)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
index a37128b0d745a..a0ef7d9a7a4db 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp
@@ -330,6 +330,9 @@ class AMDGPUCodeGenPrepareImpl
   bool visitBitreverseIntrinsicInst(IntrinsicInst &I);
   bool visitMinNum(IntrinsicInst &I);
   bool visitSqrt(IntrinsicInst &I);
+
+  bool visitAllocaInst(AllocaInst &I);
+
   bool run();
 };
 
@@ -2355,6 +2358,23 @@ bool AMDGPUCodeGenPrepareImpl::visitSqrt(IntrinsicInst &Sqrt) {
   return true;
 }
 
+// Rewrite alloca with AS0 to alloca with AS5 followed by a addrspace cast.
+bool AMDGPUCodeGenPrepareImpl::visitAllocaInst(AllocaInst &I) {
+  if (I.getAddressSpace() == DL.getAllocaAddrSpace())
+    return false;
+  assert(I.getAddressSpace() == 0 && "An alloca can't be in random AS");
+  IRBuilder<> Builder(&I);
+  AllocaInst *NewAI = Builder.CreateAlloca(I.getType(), DL.getAllocaAddrSpace(),
+                                           I.getArraySize());
+  NewAI->takeName(&I);
+  NewAI->copyMetadata(I);
+  Value *CastI = Builder.CreateAddrSpaceCast(NewAI, I.getType(),
+                                             NewAI->getName() + ".cast");
+  I.replaceAllUsesWith(CastI);
+  I.eraseFromParent();
+  return true;
+}
+
 bool AMDGPUCodeGenPrepare::runOnFunction(Function &F) {
   if (skipFunction(F))
     return false;
diff --git a/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll
new file mode 100644
index 0000000000000..8a2b54c77ea5d
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/amdgpu-codegenprepare-alloca-as0.ll
@@ -0,0 +1,17 @@
+; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 2
+; RUN: opt -S -mtriple=amdgcn-amd-amdhsa -mcpu=hawaii -passes=amdgpu-codegenprepare %s | FileCheck %s
+
+declare void @foo(ptr)
+
+define void @bar() {
+; CHECK-LABEL: define void @bar
+; CHECK-SAME: () #[[ATTR0:[0-9]+]] {
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca ptr, align 8, addrspace(5)
+; CHECK-NEXT:    [[ALLOCA_CAST:%.*]] = addrspacecast ptr addrspace(5) [[ALLOCA]] to ptr
+; CHECK-NEXT:    call void @foo(ptr [[ALLOCA_CAST]])
+; CHECK-NEXT:    ret void
+;
+  %alloca = alloca i32, align 4
+  call void @foo(ptr %alloca)
+  ret void
+}
diff --git a/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll b/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll
deleted file mode 100644
index 1e72e679e83c0..0000000000000
--- a/llvm/test/CodeGen/AMDGPU/assert-wrong-alloca-addrspace.ll
+++ /dev/null
@@ -1,16 +0,0 @@
-; RUN: not --crash llc -mtriple=amdgcn -mcpu=gfx900 -filetype=null %s 2>&1 | FileCheck %s
-
-; The alloca has the wrong address space and is passed to a call. The
-; FrameIndex was created with the natural 32-bit pointer type instead
-; of the declared 64-bit. Make sure we don't assert.
-
-; CHECK: LLVM ERROR: Cannot select: {{.*}}: i64 = FrameIndex<0>
-
-declare void @func(ptr)
-
-define void @main() {
-bb:
-  %alloca = alloca i32, align 4
-  call void @func(ptr %alloca)
-  ret void
-}

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

shiltian · 2025-04-22T04:35:33Z

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

  bool visitMinNum(IntrinsicInst &I);
  bool visitSqrt(IntrinsicInst &I);
+
+  bool visitAllocaInst(AllocaInst &I);


We might want to run this earlier to make it more useful, although I'm a bit skeptical about how practically useful it'll be in the end. This is more like a safeguard for irregular code paths. Normal code definitely wouldn't emit this kind of alloca.

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp

arsenm

This is the wrong place to handle this. AMDGPUCodeGenPrepare cannot be used for lowering. The first change should be to custom lower alloca and insert the cast there.

A follow up change should handle the alloca case in getAssumedAddrSpace to get it to fold earlier

shiltian · 2025-04-22T18:11:43Z

This is the wrong place to handle this. AMDGPUCodeGenPrepare cannot be used for lowering.

Then I wonder what AMDGPUCodeGenPrepareImpl::visitAddrSpaceCastInst is?

Also, the NVPTX handling you mentioned in another PR is also a IR pass. I think to lower it in the middle end instead of instruction selection can take advantage of existing middle end optimization, though AMDGPUCodeGenPrepare seems to be a little bit too late.

arsenm · 2025-04-22T18:27:33Z

This is the wrong place to handle this. AMDGPUCodeGenPrepare cannot be used for lowering.

Then I wonder what AMDGPUCodeGenPrepareImpl::visitAddrSpaceCastInst is?

It's not lowering, it's hacking in a poor substitute for a nonnull flag on the instruction. It's not required.

Also, the NVPTX handling you mentioned in another PR is also a IR pass. I think to lower it in the middle end instead of instruction selection can take advantage of existing middle end optimization, though AMDGPUCodeGenPrepare seems to be a little bit too late.

I never said what NVPTX does is good. If you implement getAssumedAddrSpace, you'll get the casting pattern which will be cleaned up in InferAddressSpaces at an earlier point. The backend just needs direct handling of the fallback case

llvm/test/CodeGen/AMDGPU/alloca-as0.ll

arsenm · 2025-04-23T11:41:37Z

The logic to detect private->flat casts also needs to be updated in AMDGPUAttributor

shiltian · 2025-04-23T13:09:55Z

The logic to detect private->flat casts also needs to be updated in AMDGPUAttributor

You mean in the AAAddressSpace?

arsenm · 2025-04-23T13:35:31Z

You mean in the AAAddressSpace?

No. I mean in the detection of the queue pointer uses and flat scratch init. The places looking for ADDR_SPACE_CAST_PRIVATE_TO_FLAT

shiltian · 2025-04-23T20:48:40Z

You mean in the AAAddressSpace?

No. I mean in the detection of the queue pointer uses and flat scratch init. The places looking for ADDR_SPACE_CAST_PRIVATE_TO_FLAT

Done.

github-actions · 2025-04-23T20:50:39Z

✅ With the latest revision this PR passed the C/C++ code formatter.

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast back to AS0.

shiltian · 2025-04-25T05:25:02Z

Based on the feedback in #136865 and #135820, I think not supporting this is the more appropriate direction. I propose that we close this PR and instead enforce that alloca must be in AS5 in the verifier, basically reverting to what #135820 initially did.

This way, we avoid hitting backend errors like "cannot select frameindex"” which can be misleading and make it sound like a backend bug when it actually isn't. If we find more practical use cases in the future that require supporting AS0, we can always revisit and reopen this PR.

What do you think? @arsenm CC @nikic @jdoerfert

nikic · 2025-04-25T07:25:38Z

@shiltian Yes, I think we should do that.

llvmbot added the backend:AMDGPU label Apr 21, 2025

shiltian requested a review from arsenm April 21, 2025 17:30

shiltian mentioned this pull request Apr 21, 2025

[AMDGPU][Verifier] Check address space of alloca instruction #135820

Merged

shiltian commented Apr 21, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp Outdated Show resolved Hide resolved

shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch 2 times, most recently from c544c2e to 37fc6aa Compare April 22, 2025 04:24

shiltian commented Apr 22, 2025

View reviewed changes

ritter-x2a reviewed Apr 22, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUCodeGenPrepare.cpp Outdated Show resolved Hide resolved

shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch from 37fc6aa to 71fc11c Compare April 22, 2025 13:41

arsenm requested changes Apr 22, 2025

View reviewed changes

shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch 3 times, most recently from 22cdc02 to ae09399 Compare April 23, 2025 01:49

shiltian mentioned this pull request Apr 23, 2025

[AMDGPU] Make AllocaInst return AS5 in getAssumedAddrSpace #136798

Closed

arsenm reviewed Apr 23, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/alloca-as0.ll Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/alloca-as0.ll Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/alloca-as0.ll Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/alloca-as0.ll Outdated Show resolved Hide resolved

shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch from ae09399 to c77ed29 Compare April 23, 2025 13:17

shiltian mentioned this pull request Apr 23, 2025

[Attributor] Use getAssumedAddrSpace to get address space for AllocaInst #136865

Closed

shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch 2 times, most recently from d04b738 to 1f1cbe2 Compare April 23, 2025 20:48

[AMDGPU] Support alloca in AS0

01e6991

This PR lowers an alloca in AS0 to an alloca in AS5 followed by an addrspacecast back to AS0.

shiltian force-pushed the users/shiltian/amdgpu-alloca-as0 branch from 1f1cbe2 to 01e6991 Compare April 23, 2025 20:51

shiltian closed this Apr 25, 2025

arsenm deleted the users/shiltian/amdgpu-alloca-as0 branch April 25, 2025 15:17

[AMDGPU] Support alloca in AS0 #136584

[AMDGPU] Support alloca in AS0 #136584

Uh oh!

Conversation

shiltian commented Apr 21, 2025

Uh oh!

shiltian commented Apr 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Apr 21, 2025

Uh oh!

Uh oh!

shiltian Apr 22, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

shiltian commented Apr 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

arsenm commented Apr 22, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arsenm commented Apr 23, 2025

Uh oh!

shiltian commented Apr 23, 2025

Uh oh!

arsenm commented Apr 23, 2025

Uh oh!

shiltian commented Apr 23, 2025

Uh oh!

github-actions bot commented Apr 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nikic commented Apr 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

shiltian commented Apr 21, 2025 •

edited

Loading

shiltian commented Apr 22, 2025 •

edited

Loading

github-actions bot commented Apr 23, 2025 •

edited

Loading

shiltian commented Apr 25, 2025 •

edited

Loading