Skip to content

Commit 70332f2

Browse files
committed
AMDGPU: Report unaligned scratch access as fast if supported by tgt
This enables more consecutive load folding during aggressive-instcombine.
1 parent 50bcf68 commit 70332f2

File tree

6 files changed

+815
-440
lines changed

6 files changed

+815
-440
lines changed

llvm/lib/Target/AMDGPU/SIISelLowering.cpp

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2098,10 +2098,16 @@ bool SITargetLowering::allowsMisalignedMemoryAccessesImpl(
20982098
if (AddrSpace == AMDGPUAS::PRIVATE_ADDRESS ||
20992099
AddrSpace == AMDGPUAS::FLAT_ADDRESS) {
21002100
bool AlignedBy4 = Alignment >= Align(4);
2101+
if (Subtarget->hasUnalignedScratchAccessEnabled()) {
2102+
if (IsFast)
2103+
*IsFast = AlignedBy4 ? Size : 1;
2104+
return true;
2105+
}
2106+
21012107
if (IsFast)
21022108
*IsFast = AlignedBy4;
21032109

2104-
return AlignedBy4 || Subtarget->hasUnalignedScratchAccessEnabled();
2110+
return AlignedBy4;
21052111
}
21062112

21072113
// So long as they are correct, wide global memory operations perform better

0 commit comments

Comments
 (0)