-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AMDGPU] Skip handling of non-byte types in promote alloca. #128769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -759,6 +759,14 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) { | |
| return false; | ||
| } | ||
|
|
||
| Type *VecEltTy = VectorTy->getElementType(); | ||
| constexpr unsigned SIZE_OF_BYTE = 8; | ||
| unsigned ElementSizeInBits = DL->getTypeSizeInBits(VecEltTy); | ||
| // FIXME: The non-byte type like i1 can be packed and be supported, but | ||
| // currently we do not handle them. | ||
| if (ElementSizeInBits % SIZE_OF_BYTE != 0) | ||
| return false; | ||
|
|
||
| std::map<GetElementPtrInst *, WeakTrackingVH> GEPVectorIdx; | ||
| SmallVector<Instruction *> WorkList; | ||
| SmallVector<Instruction *> UsersToRemove; | ||
|
|
@@ -776,8 +784,7 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) { | |
|
|
||
| LLVM_DEBUG(dbgs() << " Attempting promotion to: " << *VectorTy << "\n"); | ||
|
|
||
| Type *VecEltTy = VectorTy->getElementType(); | ||
| unsigned ElementSize = DL->getTypeSizeInBits(VecEltTy) / 8; | ||
| unsigned ElementSize = ElementSizeInBits / SIZE_OF_BYTE; | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. IIUC
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You mean , to use some thing like this to derive the value from data layout "DL.getTypeSizeInBits(Type::getInt8Ty(M->getContext()))".
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I have defined it to be "constexpr unsigned SIZE_OF_BYTE = 8" in line 763. Probably pick a different name ?
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Oh, I missed that part. Hardcoding 8 is probably fine for now and in the any near future, but the proper approach is definitely to query DL. |
||
| for (auto *U : Uses) { | ||
| Instruction *Inst = cast<Instruction>(U->getUser()); | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| ; NOTE: Assertions have been autogenerated by utils/update_test_checks.py UTC_ARGS: --version 5 | ||
| ; RUN: opt -S -mtriple=amdgcn-unknown-amdhsa -passes=amdgpu-promote-alloca < %s | FileCheck %s | ||
|
|
||
| ; Verify that we do not crash and not promote non-byte alloca types. | ||
| define <8 x i1> @non_byte_alloca_type() { | ||
| ; CHECK-LABEL: define <8 x i1> @non_byte_alloca_type() { | ||
| ; CHECK-NEXT: [[ENTRY:.*:]] | ||
| ; CHECK-NEXT: [[C:%.*]] = icmp ugt <16 x i1> zeroinitializer, zeroinitializer | ||
| ; CHECK-NEXT: [[RP:%.*]] = alloca <8 x i1>, align 1 | ||
| ; CHECK-NEXT: [[TMP0:%.*]] = load <8 x i1>, ptr [[RP]], align 1 | ||
| ; CHECK-NEXT: store <16 x i1> [[C]], ptr [[RP]], align 2 | ||
| ; CHECK-NEXT: ret <8 x i1> [[TMP0]] | ||
| ; | ||
| entry: | ||
| %C = icmp ugt <16 x i1> zeroinitializer, zeroinitializer | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use something that can't fold away |
||
| %RP = alloca <8 x i1>, align 1 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use the correct alloca address space. Also this issue isn't about the UB under-alignment, so correct that
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thats correct. Here is an example that might trigger an UB @g = global <8 x float> <float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01, float 4.200000e+01> define <8 x i1> @f(float %0, i32 %1, i16 %2) {
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also, if you do not specify the addrspace , wouldn't it default to generic addrsapce which is "0" |
||
| %0 = load <8 x i1>, ptr %RP, align 1 | ||
| store <16 x i1> %C, ptr %RP, align 2 | ||
| ret <8 x i1> %0 | ||
| } | ||
|
|
||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can you add some tests for the scalar case? Only the subvector extract was a problem?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The assertion trigered here is due to subvector being <2 x i1> and the access type being <16 x i1> assert(DL.getTypeStoreSize(SubVecTy) == DL.getTypeStoreSize(AccessTy));
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the assertions I am seeing are all being trigerred while handling subvectors for loads and stores. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to replicate typeSizeEqualsStoreSize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Will do