Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 20 additions & 6 deletions llvm/lib/Target/AMDGPU/AMDGPUPromoteAlloca.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -676,7 +676,17 @@ static bool isSupportedAccessType(FixedVectorType *VecTy, Type *AccessTy,
//
// We could handle more complicated cases, but it'd make things a lot more
// complicated.
if (isa<FixedVectorType>(AccessTy)) {

// If both are pointer types, verify if they are compatible to copy across
// address spaces.
bool canCopyAcrossAddressSpaces = true;
if (AccessTy->isPtrOrPtrVectorTy() && VecTy->isPtrOrPtrVectorTy()) {
if (DL.getPointerSize(AccessTy->getPointerAddressSpace()) !=
DL.getPointerSize(VecTy->getPointerAddressSpace()))
canCopyAcrossAddressSpaces = false;
}
Comment on lines +682 to +687
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no copy across address spaces here, this check is conceptually wrong. You only need to verify the size is compatible. For the final code emission, you'll need to insert no-op casts to get the types to match

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the existing code here is correct, and you would only need adjustment at the code transformation later

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me explore the no-op casts.
So <2 x ptr addrspace(5)> can be stored in to ptr addrspace (0) ? Isn't this the undefined behavior ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be illegal type punning but you don't have the context. It's just bytes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Matt. I will try a few ways to avoid this casting assert.
Another quick question, at which point in optimization pipeline, we decide this is an illegal type punning leading to undefined behavior ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a property of the pass pipeline, it's a property of the system as a whole. The address space cast from addrspace(5) to 0 is not a no-op cast. If you reload it and use it as the wrong type, it will be an invalid pointer


if (isa<FixedVectorType>(AccessTy) && canCopyAcrossAddressSpaces) {
TypeSize AccTS = DL.getTypeStoreSize(AccessTy);
TypeSize VecTS = DL.getTypeStoreSize(VecTy->getElementType());
return AccTS.isKnownMultipleOf(VecTS);
Expand Down Expand Up @@ -800,11 +810,15 @@ bool AMDGPUPromoteAllocaImpl::tryPromoteAllocaToVector(AllocaInst &Alloca) {

Ptr = Ptr->stripPointerCasts();

// Alloca already accessed as vector.
if (Ptr == &Alloca && DL->getTypeStoreSize(Alloca.getAllocatedType()) ==
DL->getTypeStoreSize(AccessTy)) {
WorkList.push_back(Inst);
continue;
// Since the size of pointer is different across address spaces, let the
// isSupportedAccessType() handle the pointer load and store accesses.
if (!AccessTy->isPtrOrPtrVectorTy() || !VectorTy->isPtrOrPtrVectorTy()) {
// Alloca already accessed as vector.
if (Ptr == &Alloca &&
DL->getTypeStoreSize(AllocaTy) == DL->getTypeStoreSize(AccessTy)) {
WorkList.push_back(Inst);
continue;
}
}

if (!isSupportedAccessType(VectorTy, AccessTy, *DL))
Expand Down
30 changes: 30 additions & 0 deletions llvm/test/CodeGen/AMDGPU/promote-alloca-loadstores.ll
Original file line number Diff line number Diff line change
Expand Up @@ -198,3 +198,33 @@ entry:
%tmp = load ptr addrspace(3), ptr addrspace(5) %alloca, align 8
ret ptr addrspace(3) %tmp
}

; Will not vectorize because we're saving a 64-bit pointer from addrspace 0
; in to two 32 bits pointers of addrspace 5.
; CHECK-LABEL: define void @alloca_load_store_ptr_mixed_addrspace_ptrvec
; CHECK-NEXT: entry:
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca <2 x ptr addrspace(5)>, align 8, addrspace(5)
; CHECK-NEXT: store ptr undef, ptr addrspace(5) [[ALLOCA]], align 8
; CHECK-NEXT: ret void
;
define void @alloca_load_store_ptr_mixed_addrspace_ptrvec() {
entry:
%A2 = alloca <2 x ptr addrspace(5)>, align 8, addrspace(5)
store ptr undef, ptr addrspace(5) %A2, align 8
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also test with a real value instead of an undef. A test with a constant leaf would be useful too

ret void
}

; Will not vectorize because we're saving a 32-bit pointers from addrspace 5
; in to two 64 bits pointers of addrspace 0, even though the size in memory
; is same.
; CHECK-LABEL: define void @alloca_load_store_ptr_mixed_addrspace_ptrvec2
; CHECK-NEXT: entry:
; CHECK-NEXT: [[ALLOCA:%.*]] = alloca <2 x ptr>, align 8
; CHECK-NEXT: store <4 x ptr addrspace(5)> undef, ptr [[ALLOCA]], align 8
; CHECK-NEXT: ret void
define void @alloca_load_store_ptr_mixed_addrspace_ptrvec2() {
entry:
%A2 = alloca <2 x ptr>, align 8
store <4 x ptr addrspace(5)> undef, ptr %A2, align 8
ret void
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should reproduce the same situation with one as a scalar, and some with int/fp types

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please be specific here on what you mean by scalar ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i32 and double

Loading