Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 9 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -759,6 +759,14 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {
return false;
}

Type *VecEltTy = VectorTy->getElementType();
constexpr unsigned SIZE_OF_BYTE = 8;
unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy);
// FIXME: The non-byte type like i1 can be packed and be supported, but
// currently we do not handle them.
if (ElementSizeInBits % SIZE_OF_BYTE != 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to replicate typeSizeEqualsStoreSize

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Will do

return false;

std::map<GetElementPtrInst *, WeakTrackingVH> GEPVectorIdx;
SmallVector<Instruction *> WorkList;
SmallVector<Instruction *> UsersToRemove;
Expand All @@ -776,8 +784,7 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {

LLVM_DEBUG(dbgs() << " Attempting promotion to: " << *VectorTy << "\n");

Type *VecEltTy = VectorTy->getElementType();
unsigned ElementSize = DL->getTypeSizeInBits(VecEltTy) / 8;
unsigned ElementSize = ElementSizeInBits / SIZE_OF_BYTE;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC SIZE_OF_BYTE is defined by the whatever compiler compiles LLVM instead of for AMDGPU.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean , to use some thing like this to derive the value from data layout "DL.getTypeSizeInBits(Type::getInt8Ty(M->getContext()))".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have defined it to be "constexpr unsigned SIZE_OF_BYTE = 8" in line 763. Probably pick a different name ?

Copy link
Contributor

@shiltian shiltian Feb 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I missed that part. Hardcoding 8 is probably fine for now and in the any near future, but the proper approach is definitely to query DL.

for (auto *U : Uses) {
Instruction *Inst = cast<Instruction>(U->getUser());

Expand Down
21 changes: 21 additions & 0 deletions llvm/test/CodeGen/AMDGPU/promote-alloca-skip-non-byte-type.ll
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5
; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -passes=amdgpu-promote-alloca < %s | FileCheck %s

; Verify that we do not crash and not promote non-byte alloca types.
define <8 x i1> @non_byte_alloca_type() {
; CHECK-LABEL: define <8 x i1> @non_byte_alloca_type() {
; CHECK-NEXT: [[ENTRY:.*:]]
; CHECK-NEXT: [[C:%.*]] = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
; CHECK-NEXT: [[RP:%.*]] = alloca <8 x i1>, align 1
; CHECK-NEXT: [[TMP0:%.*]] = load <8 x i1>, ptr [[RP]], align 1
; CHECK-NEXT: store <16 x i1> [[C]], ptr [[RP]], align 2
; CHECK-NEXT: ret <8 x i1> [[TMP0]]
;
entry:
%C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use something that can't fold away

%RP = alloca <8 x i1>, align 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the correct alloca address space. Also this issue isn't about the UB under-alignment, so correct that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats correct. Here is an example that might trigger an UB

@g = global <8 x float> <float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01>

define <8 x i1> @f(float %0, i32 %1, i16 %2) {
BB:
%LGV = load <8 x float>, ptr @g, align 32
%RP = alloca <8 x i1>, align 1
%L = load <8 x float>, ptr %RP, align 32
%C = fcmp olt <8 x float> %L, %LGV
ret <8 x i1> %C
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, if you do not specify the addrspace , wouldn't it default to generic addrsapce which is "0"

%0 = load <8 x i1>, ptr %RP, align 1
store <16 x i1> %C, ptr %RP, align 2
ret <8 x i1> %0
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for the scalar case? Only the subvector extract was a problem?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assertion trigered here is due to subvector being <2 x i1> and the access type being <16 x i1>
The access size for < 16 x i1> is 2 and the computation to derive the subvector relies on this access size and ended with a <2xi1> that triggered the assert due to mismatch in storage size.

assert(DL.getTypeStoreSize(SubVecTy) == DL.getTypeStoreSize(AccessTy));

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the assertions I am seeing are all being trigerred while handling subvectors for loads and stores.

Loading