[AMDGPU] Restrict promote alloca on pointers across address spaces #119762

sgundapa · 2024-12-12T21:19:01Z

If the load/store of a pointer to stack that is not in the same address space, we restrict the promote alloca pass not to vectorize if the pointer storage sizes are different.
Example: In address space 0, pointer size is 64 bits.
In address space 5, pointer size if 32 bits.
Casting the pointer across these address spaces is undefined behavior.
Assertion found through fuzzing.

If the load/store of a pointer to stack that is not in the same address space, we restrict the promote alloca pass not to vectorize if the pointer storage sizes are different. Example: In address space 0, pointer size is 64 bits. In address space 5, pointer size if 32 bits. Casting the pointer across these address spaces is undefined behavior. Assertion found through fuzzing.

llvmbot · 2024-12-12T21:19:36Z

@llvm/pr-subscribers-backend-amdgpu

Author: Sumanth Gundapaneni (sgundapa)

Changes

If the load/store of a pointer to stack that is not in the same address space, we restrict the promote alloca pass not to vectorize if the pointer storage sizes are different.
Example: In address space 0, pointer size is 64 bits.
In address space 5, pointer size if 32 bits.
Casting the pointer across these address spaces is undefined behavior.
Assertion found through fuzzing.

Full diff: https://github.com/llvm/llvm-project/pull/119762.diff

3 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp (+10)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll (+36-31)
(modified) llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll (+24-27)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
index e27ef71c1c0883..913a601b0e0888 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
@@ -796,6 +796,16 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
       if (!IsSimple)
         return RejectUser(Inst, "not a simple load or store");
 
+      // If the access type is a pointer, reject the address spaces with
+      // different pointer sizes.
+      // store <2 x ptr> %arg, ptr addrspace(5) %alloca - Reject.
+      // %tmp = load <4 x ptr addrspace(3)>, ptr addrspace(5) %alloca - ok.
+      if (AccessTy->isPtrOrPtrVectorTy()) {
+        if (DL->getPointerSize(getLoadStoreAddressSpace(Inst)) !=
+            DL->getPointerSize(AccessTy->getPointerAddressSpace()))
+          return RejectUser(Inst, "pointers to incompatible address spaces");
+      }
+
       Ptr = Ptr->stripPointerCasts();
 
       // Alloca already accessed as vector.
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll
index 1e49500a243e10..2a22cdda7a7e79 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll
@@ -93,21 +93,6 @@ end:
   ret void
 }
 
-define ptr @alloca_load_store_ptr64_full_ivec(ptr %arg) {
-; CHECK-LABEL: define ptr @alloca_load_store_ptr64_full_ivec
-; CHECK-SAME: (ptr [[ARG:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = ptrtoint ptr [[ARG]] to i64
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast i64 [[TMP0]] to <8 x i8>
-; CHECK-NEXT:    ret ptr [[ARG]]
-;
-entry:
-  %alloca = alloca [8 x i8], align 8, addrspace(5)
-  store ptr %arg, ptr addrspace(5) %alloca, align 8
-  %tmp = load ptr, ptr addrspace(5) %alloca, align 8
-  ret ptr %tmp
-}
-
 define ptr addrspace(3) @alloca_load_store_ptr32_full_ivec(ptr addrspace(3) %arg) {
 ; CHECK-LABEL: define ptr addrspace(3) @alloca_load_store_ptr32_full_ivec
 ; CHECK-SAME: (ptr addrspace(3) [[ARG:%.*]]) {
@@ -123,22 +108,6 @@ entry:
   ret ptr addrspace(3) %tmp
 }
 
-define <4 x ptr addrspace(3)> @alloca_load_store_ptr_mixed_full_ptrvec(<2 x ptr> %arg) {
-; CHECK-LABEL: define <4 x ptr addrspace(3)> @alloca_load_store_ptr_mixed_full_ptrvec
-; CHECK-SAME: (<2 x ptr> [[ARG:%.*]]) {
-; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = ptrtoint <2 x ptr> [[ARG]] to <2 x i64>
-; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x i64> [[TMP0]] to <4 x i32>
-; CHECK-NEXT:    [[TMP2:%.*]] = inttoptr <4 x i32> [[TMP1]] to <4 x ptr addrspace(3)>
-; CHECK-NEXT:    ret <4 x ptr addrspace(3)> [[TMP2]]
-;
-entry:
-  %alloca = alloca [4 x i32], align 8, addrspace(5)
-  store <2 x ptr> %arg, ptr addrspace(5) %alloca, align 8
-  %tmp = load <4 x ptr addrspace(3)>, ptr addrspace(5) %alloca, align 8
-  ret <4 x ptr addrspace(3)> %tmp
-}
-
 define <8 x i16> @ptralloca_load_store_ints_full(<2 x i64> %arg) {
 ; CHECK-LABEL: define <8 x i16> @ptralloca_load_store_ints_full
 ; CHECK-SAME: (<2 x i64> [[ARG:%.*]]) {
@@ -198,3 +167,39 @@ entry:
   %tmp = load ptr addrspace(3), ptr addrspace(5) %alloca, align 8
   ret ptr addrspace(3) %tmp
 }
+
+; Will not vectorize because we are doing a load/store of a pointer across
+; address spaces of varying pointer sizes.
+define ptr @alloca_load_store_ptr64_full_ivec(ptr %arg) {
+; CHECK-LABEL: define ptr @alloca_load_store_ptr64_full_ivec
+; CHECK-SAME: (ptr [[ARG:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [8 x i8], align 8, addrspace(5)
+; CHECK-NEXT:    store ptr [[ARG]], ptr addrspace(5) [[ALLOCA]], align 8
+; CHECK-NEXT:    [[TMP:%.*]] = load ptr, ptr addrspace(5) [[ALLOCA]], align 8
+; CHECK-NEXT:    ret ptr [[TMP]]
+;
+entry:
+  %alloca = alloca [8 x i8], align 8, addrspace(5)
+  store ptr %arg, ptr addrspace(5) %alloca, align 8
+  %tmp = load ptr, ptr addrspace(5) %alloca, align 8
+  ret ptr %tmp
+}
+
+; Will not vectorize because we are doing a load/store of a pointer across
+; address spaces of varying pointer sizes.
+define <4 x ptr addrspace(3)> @alloca_load_store_ptr_mixed_full_ptrvec(<2 x ptr> %arg) {
+; CHECK-LABEL: define <4 x ptr addrspace(3)> @alloca_load_store_ptr_mixed_full_ptrvec
+; CHECK-SAME: (<2 x ptr> [[ARG:%.*]]) {
+; CHECK-NEXT:  entry:
+; CHECK-NEXT:    [[ALLOCA:%.*]] = alloca [4 x i32], align 8, addrspace(5)
+; CHECK-NEXT:    store <2 x ptr> [[ARG]], ptr addrspace(5) [[ALLOCA]], align 8
+; CHECK-NEXT:    [[TMP:%.*]] = load <4 x ptr addrspace(3)>, ptr addrspace(5) [[ALLOCA]], align 8
+; CHECK-NEXT:    ret <4 x ptr addrspace(3)> [[TMP]]
+;
+entry:
+  %alloca = alloca [4 x i32], align 8, addrspace(5)
+  store <2 x ptr> %arg, ptr addrspace(5) %alloca, align 8
+  %tmp = load <4 x ptr addrspace(3)>, ptr addrspace(5) %alloca, align 8
+  ret <4 x ptr addrspace(3)> %tmp
+}
diff --git a/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll b/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll
index 7c5410004ed5b7..b583f33a7d9e63 100644
--- a/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll
+++ b/llvm/test/CodeGen/AMDGPU/promote-alloca-subvecs.ll
@@ -218,38 +218,35 @@ entry:
   ret void
 }
 
-define void @test_different_type_subvector_ptrs(<2 x ptr addrspace(1)> %val.0, <4 x ptr addrspace(3)> %val.1) {
+define void @test_different_type_subvector_ptrs(<2 x ptr addrspace(3)> %val.0, <4 x ptr addrspace(3)> %val.1) {
 ; CHECK-LABEL: define void @test_different_type_subvector_ptrs
-; CHECK-SAME: (<2 x ptr addrspace(1)> [[VAL_0:%.*]], <4 x ptr addrspace(3)> [[VAL_1:%.*]]) {
+; CHECK-SAME: (<2 x ptr addrspace(3)> [[VAL_0:%.*]], <4 x ptr addrspace(3)> [[VAL_1:%.*]]) {
 ; CHECK-NEXT:  entry:
-; CHECK-NEXT:    [[TMP0:%.*]] = ptrtoint <2 x ptr addrspace(1)> [[VAL_0]] to <2 x i64>
-; CHECK-NEXT:    [[TMP1:%.*]] = extractelement <2 x i64> [[TMP0]], i64 0
-; CHECK-NEXT:    [[TMP2:%.*]] = insertelement <4 x i64> undef, i64 [[TMP1]], i32 0
-; CHECK-NEXT:    [[TMP3:%.*]] = extractelement <2 x i64> [[TMP0]], i64 1
-; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <4 x i64> [[TMP2]], i64 [[TMP3]], i32 1
-; CHECK-NEXT:    [[TMP5:%.*]] = insertelement <2 x i64> poison, i64 [[TMP1]], i64 0
-; CHECK-NEXT:    [[TMP6:%.*]] = insertelement <2 x i64> [[TMP5]], i64 [[TMP3]], i64 1
-; CHECK-NEXT:    [[TMP7:%.*]] = inttoptr <2 x i64> [[TMP6]] to <2 x ptr addrspace(1)>
-; CHECK-NEXT:    [[DUMMYUSER:%.*]] = freeze <2 x ptr addrspace(1)> [[TMP7]]
-; CHECK-NEXT:    [[TMP8:%.*]] = ptrtoint <4 x ptr addrspace(3)> [[VAL_1]] to <4 x i32>
-; CHECK-NEXT:    [[TMP9:%.*]] = bitcast <4 x i32> [[TMP8]] to <2 x i64>
-; CHECK-NEXT:    [[TMP10:%.*]] = extractelement <2 x i64> [[TMP9]], i64 0
-; CHECK-NEXT:    [[TMP11:%.*]] = insertelement <4 x i64> [[TMP4]], i64 [[TMP10]], i32 0
-; CHECK-NEXT:    [[TMP12:%.*]] = extractelement <2 x i64> [[TMP9]], i64 1
-; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <4 x i64> [[TMP11]], i64 [[TMP12]], i32 1
-; CHECK-NEXT:    [[TMP14:%.*]] = insertelement <2 x i64> poison, i64 [[TMP10]], i64 0
-; CHECK-NEXT:    [[TMP15:%.*]] = insertelement <2 x i64> [[TMP14]], i64 [[TMP12]], i64 1
-; CHECK-NEXT:    [[TMP16:%.*]] = bitcast <2 x i64> [[TMP15]] to <4 x i32>
-; CHECK-NEXT:    [[TMP17:%.*]] = inttoptr <4 x i32> [[TMP16]] to <4 x ptr addrspace(3)>
-; CHECK-NEXT:    [[DUMMYUSER_1:%.*]] = freeze <4 x ptr addrspace(3)> [[TMP17]]
-; CHECK-NEXT:    ret void
-;
+; CHECK-NEXT:    [[TMP0:%.*]] = ptrtoint <2 x ptr addrspace(3)> [[VAL_0]] to <2 x i32>
+; CHECK-NEXT:    [[TMP1:%.*]] = bitcast <2 x i32> [[TMP0]] to <1 x i64>
+; CHECK-NEXT:    [[TMP2:%.*]] = extractelement <1 x i64> [[TMP1]], i64 0
+; CHECK-NEXT:    [[TMP3:%.*]] = insertelement <4 x i64> undef, i64 [[TMP2]], i32 0
+; CHECK-NEXT:    [[TMP4:%.*]] = insertelement <1 x i64> poison, i64 [[TMP2]], i64 0
+; CHECK-NEXT:    [[TMP5:%.*]] = bitcast <1 x i64> [[TMP4]] to <2 x i32>
+; CHECK-NEXT:    [[TMP6:%.*]] = inttoptr <2 x i32> [[TMP5]] to <2 x ptr addrspace(3)>
+; CHECK-NEXT:    [[DUMMYUSER:%.*]] = freeze <2 x ptr addrspace(3)> [[TMP6]]
+; CHECK-NEXT:    [[TMP7:%.*]] = ptrtoint <4 x ptr addrspace(3)> [[VAL_1]] to <4 x i32>
+; CHECK-NEXT:    [[TMP8:%.*]] = bitcast <4 x i32> [[TMP7]] to <2 x i64>
+; CHECK-NEXT:    [[TMP9:%.*]] = extractelement <2 x i64> [[TMP8]], i64 0
+; CHECK-NEXT:    [[TMP10:%.*]] = insertelement <4 x i64> [[TMP3]], i64 [[TMP9]], i32 0
+; CHECK-NEXT:    [[TMP11:%.*]] = extractelement <2 x i64> [[TMP8]], i64 1
+; CHECK-NEXT:    [[TMP12:%.*]] = insertelement <4 x i64> [[TMP10]], i64 [[TMP11]], i32 1
+; CHECK-NEXT:    [[TMP13:%.*]] = insertelement <2 x i64> poison, i64 [[TMP9]], i64 0
+; CHECK-NEXT:    [[TMP14:%.*]] = insertelement <2 x i64> [[TMP13]], i64 [[TMP11]], i64 1
+; CHECK-NEXT:    [[TMP15:%.*]] = bitcast <2 x i64> [[TMP14]] to <4 x i32>
+; CHECK-NEXT:    [[TMP16:%.*]] = inttoptr <4 x i32> [[TMP15]] to <4 x ptr addrspace(3)>
+; CHECK-NEXT:    [[DUMMYUSER_1:%.*]] = freeze <4 x ptr addrspace(3)> [[TMP16]]
 entry:
   %stack = alloca [4 x i64], align 4, addrspace(5)
 
-  store <2 x ptr addrspace(1)> %val.0, ptr addrspace(5) %stack
-  %reload = load <2 x ptr addrspace(1)>, ptr addrspace(5) %stack
-  %dummyuser = freeze <2 x ptr addrspace(1)> %reload
+  store <2 x ptr addrspace(3)> %val.0, ptr addrspace(5) %stack
+  %reload = load <2 x ptr addrspace(3)>, ptr addrspace(5) %stack
+  %dummyuser = freeze <2 x ptr addrspace(3)> %reload
 
   store <4 x ptr addrspace(3)> %val.1, ptr addrspace(5) %stack
   %reload.1 = load <4 x ptr addrspace(3)>, ptr addrspace(5) %stack

shiltian

Casting the pointer across these address spaces is undefined behavior.

Are you sure?

I don't think this PR is right. It probably just treats the symptoms instead of the root cause of the assertion.

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll

arsenm · 2024-12-12T23:21:55Z

llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll

+  store <2 x ptr> %arg, ptr addrspace(5) %alloca, align 8
+  %tmp = load <4 x ptr addrspace(3)>, ptr addrspace(5) %alloca, align 8
+  ret <4 x ptr addrspace(3)> %tmp
+}


Should reproduce the same situation with one as a scalar, and some with int/fp types

Can you please be specific here on what you mean by scalar ?

i32 and double

If the load/store of a pointer to stack that is not in the same address space, we restrict the promote alloca pass not to vectorize if the pointer storage sizes are different. Example: In address space 0, pointer size is 64 bits. In address space 5, pointer size if 32 bits. Casting the pointer across these address spaces with varied pointer sizes is undefined behavior Assertion found through fuzzing.

github-actions · 2024-12-20T19:57:06Z

⚠️ undef deprecator found issues in your code. ⚠️

You can test this locally with the following command:

git diff -U0 --pickaxe-regex -S '([^a-zA-Z0-9#_-]undef[^a-zA-Z0-9_-]|UndefValue::get)' 988480323d5ef9bb658f13ac598d4ce2aa23c782 d52e0a7229f9ce471b0c15b3fe19772da2272fcd llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll

The following files introduce new uses of undef:

llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll

Undef is now deprecated and should only be used in the rare cases where no replacement is possible. For example, a load of uninitialized memory yields undef. You should use poison values for placeholders instead.

In tests, avoid using undef and having tests that trigger undefined behavior. If you need an operand with some unimportant value, you can add a new argument to the function and use that instead.

For example, this is considered a bad practice:

define void @fn() {
  ...
  br i1 undef, ...
}

Please use the following instead:

define void @fn(i1 %cond) {
  ...
  br i1 %cond, ...
}

Please refer to the Undefined Behavior Manual for more information.

arsenm · 2024-12-21T02:49:10Z

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp

+  bool canCopyAcrossAddressSpaces = true;
+  if (AccessTy->isPtrOrPtrVectorTy() && VecTy->isPtrOrPtrVectorTy()) {
+    if (DL.getPointerSize(AccessTy->getPointerAddressSpace()) !=
+        DL.getPointerSize(VecTy->getPointerAddressSpace()))
+      canCopyAcrossAddressSpaces = false;
+  }


There is no copy across address spaces here, this check is conceptually wrong. You only need to verify the size is compatible. For the final code emission, you'll need to insert no-op casts to get the types to match

I think the existing code here is correct, and you would only need adjustment at the code transformation later

Let me explore the no-op casts.
So <2 x ptr addrspace(5)> can be stored in to ptr addrspace (0) ? Isn't this the undefined behavior ?

It may be illegal type punning but you don't have the context. It's just bytes

Thanks Matt. I will try a few ways to avoid this casting assert.
Another quick question, at which point in optimization pipeline, we decide this is an illegal type punning leading to undefined behavior ?

It's not a property of the pass pipeline, it's a property of the system as a whole. The address space cast from addrspace(5) to 0 is not a no-op cast. If you reload it and use it as the wrong type, it will be an invalid pointer

arsenm · 2025-01-10T02:53:25Z

llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll

+define void @alloca_load_store_ptr_mixed_addrspace_ptrvec() {
+entry:
+  %A2 = alloca <2 x ptr addrspace(5)>, align 8, addrspace(5)
+  store  ptr undef, ptr addrspace(5) %A2, align 8


Should also test with a real value instead of an undef. A test with a constant leaf would be useful too

arsenm

Check is not correct, and test isn't using the correct alloca address space

llvmbot added the backend:AMDGPU label Dec 12, 2024

sgundapa requested review from arsenm and bcahoon December 12, 2024 21:19

shiltian requested changes Dec 12, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp Outdated Show resolved Hide resolved

arsenm reviewed Dec 12, 2024

View reviewed changes

arsenm reviewed Dec 21, 2024

View reviewed changes

arsenm reviewed Jan 10, 2025

View reviewed changes

arsenm mentioned this pull request Jan 27, 2025

[AMDGPU] Insert casts in PromoteAlloca. #124547

Open

sgundapa added 2 commits February 10, 2025 09:25

Merge branch 'llvm:main' into cast_assert

edf48ab

Merge branch 'llvm:main' into cast_assert

d52e0a7

arsenm requested changes Feb 26, 2025

View reviewed changes

sgundapa closed this Oct 30, 2025

sgundapa deleted the cast_assert branch October 30, 2025 14:24

[AMDGPU] Restrict promote alloca on pointers across address spaces #119762

[AMDGPU] Restrict promote alloca on pointers across address spaces #119762

Uh oh!

Conversation

sgundapa commented Dec 12, 2024

Uh oh!

llvmbot commented Dec 12, 2024

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

github-actions bot commented Dec 20, 2024 •

edited

Loading