Skip to content

Commit 132a448

Browse files
committed
[DAGCombiner] Refactor and improve ReduceLoadOpStoreWidth
This patch make a couple of improvements to ReduceLoadOpStoreWidth. When determining the minimum size of "NewBW" we now take byte boundaries into account. If we for example touch bits 6-10 we shouldn't accept NewBW=8, because we would fail later when detecting that we can't access bits from two different bytes in memory using a single load. Instead we make sure to align LSB/MSB according to byte size boundaries up front before searching for a viable "NewBW". In the past we only tried to find a "ShAmt" that was a multiple of "NewBW", but now we use a sliding window technique to scan for a viable "ShAmt" that is a multiple of the byte size. This can help out finding more opportunities for optimization (specially if the original type isn't byte sized, and for big-endian targets when the original load/store is aligned on the most significant bit). This is a follow up to a previous commit that made sure to no emit load/store instructions access memory outside the the region touched by the original load/store. In some situations that patch could lead to minor regressions, but the new sliding window approach to find the "ShAmt" is supposed to avoid such regressions.
1 parent 3a13d83 commit 132a448

File tree

6 files changed

+82
-98
lines changed

6 files changed

+82
-98
lines changed

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Lines changed: 66 additions & 61 deletions
Original file line numberDiff line numberDiff line change
@@ -20339,17 +20339,21 @@ SDValue DAGCombiner::ReduceLoadOpStoreWidth(SDNode *N) {
2033920339
ST->getPointerInfo().getAddrSpace())
2034020340
return SDValue();
2034120341

20342-
// Find the type to narrow it the load / op / store to.
20342+
// Find the type NewVT to narrow the load / op / store to.
2034320343
SDValue N1 = Value.getOperand(1);
2034420344
unsigned BitWidth = N1.getValueSizeInBits();
2034520345
APInt Imm = N1->getAsAPIntVal();
2034620346
if (Opc == ISD::AND)
2034720347
Imm ^= APInt::getAllOnes(BitWidth);
2034820348
if (Imm == 0 || Imm.isAllOnes())
2034920349
return SDValue();
20350-
unsigned ShAmt = Imm.countr_zero();
20351-
unsigned MSB = BitWidth - Imm.countl_zero() - 1;
20352-
unsigned NewBW = NextPowerOf2(MSB - ShAmt);
20350+
// Find least/most significant bit that need to be part of the narrowed
20351+
// operation. We assume target will need to address/access full bytes, so
20352+
// we make sure to align LSB and MSB at byte boundaries.
20353+
unsigned BitsPerByteMask = 7u;
20354+
unsigned LSB = Imm.countr_zero() & ~BitsPerByteMask;
20355+
unsigned MSB = (Imm.getActiveBits() - 1) | BitsPerByteMask;
20356+
unsigned NewBW = NextPowerOf2(MSB - LSB);
2035320357
EVT NewVT = EVT::getIntegerVT(*DAG.getContext(), NewBW);
2035420358
// The narrowing should be profitable, the load/store operation should be
2035520359
// legal (or custom) and the store size should be equal to the NewVT width.
@@ -20364,68 +20368,69 @@ SDValue DAGCombiner::ReduceLoadOpStoreWidth(SDNode *N) {
2036420368
if (NewBW >= BitWidth)
2036520369
return SDValue();
2036620370

20367-
// TODO: For big-endian we probably want to align given the most significant
20368-
// bit being modified instead of adjusting ShAmt based on least significant
20369-
// bits. This to reduce the risk of failing on the alignment check below. If
20370-
// for example VT.getStoreSize()==5 and Imm is 0x0000ffff00, then we want to
20371-
// find NewBW=16, and we want to load/store with a PtrOff set to 2. But then
20372-
// ShAmt should be set to 8, which isn't a multiple of NewBW. But given
20373-
// that isNarrowingProfitable doesn't seem to be overridden for any in-tree
20374-
// big-endian target, then the support for big-endian here isn't covered by
20375-
// any in-tree lit tests, so it is unfortunately not highly optimized
20376-
// either. It should be possible to improve that by using
20377-
// ReduceLoadOpStoreWidthForceNarrowingProfitable.
20378-
20379-
// If the lsb that is modified does not start at the type bitwidth boundary,
20380-
// align to start at the previous boundary.
20381-
ShAmt = ShAmt - (ShAmt % NewBW);
20382-
20383-
// Make sure we do not access memory outside the memory touched by the
20384-
// original load/store.
20385-
if (ShAmt + NewBW > VT.getStoreSizeInBits())
20386-
return SDValue();
20371+
// If we come this far NewVT/NewBW reflect a power-of-2 sized type that is
20372+
// large enough to cover all bits that should be modified. This type might
20373+
// however be larger than really needed (such as i32 while we actually only
20374+
// need to modify one byte). Now we need to find our how to align the memory
20375+
// accesses to satisfy preferred alignments as well as avoiding to access
20376+
// memory outside the store size of the orignal access.
20377+
20378+
unsigned VTStoreSize = VT.getStoreSizeInBits().getFixedValue();
20379+
20380+
// Let ShAmt denote amount of bits to skip, counted from the least
20381+
// significant bits of Imm. And let PtrOff how much the pointer needs to be
20382+
// offsetted (in bytes) for the new access.
20383+
unsigned ShAmt = 0;
20384+
uint64_t PtrOff = 0;
20385+
for (; ShAmt + NewBW <= VTStoreSize; ShAmt += 8) {
20386+
// Make sure the range [ShAmt, ShAmt+NewBW) cover both LSB and MSB.
20387+
if (ShAmt > LSB)
20388+
return SDValue();
20389+
if (ShAmt + NewBW < MSB)
20390+
continue;
2038720391

20388-
APInt Mask = APInt::getBitsSet(BitWidth, ShAmt,
20389-
std::min(BitWidth, ShAmt + NewBW));
20390-
if ((Imm & Mask) == Imm) {
20391-
APInt NewImm = (Imm & Mask).lshr(ShAmt).trunc(NewBW);
20392-
if (Opc == ISD::AND)
20393-
NewImm ^= APInt::getAllOnes(NewBW);
20394-
uint64_t PtrOff = ShAmt / 8;
20395-
// For big endian targets, we need to adjust the offset to the pointer to
20396-
// load the correct bytes.
20397-
if (DAG.getDataLayout().isBigEndian())
20398-
PtrOff = (BitWidth + 7 - NewBW) / 8 - PtrOff;
20392+
// Calculate PtrOff.
20393+
unsigned PtrAdjustmentInBits = DAG.getDataLayout().isBigEndian()
20394+
? VTStoreSize - NewBW - ShAmt
20395+
: ShAmt;
20396+
PtrOff = PtrAdjustmentInBits / 8;
2039920397

20398+
// Now check if narrow access is allowed and fast, considering alignments.
2040020399
unsigned IsFast = 0;
2040120400
Align NewAlign = commonAlignment(LD->getAlign(), PtrOff);
20402-
if (!TLI.allowsMemoryAccess(*DAG.getContext(), DAG.getDataLayout(), NewVT,
20403-
LD->getAddressSpace(), NewAlign,
20404-
LD->getMemOperand()->getFlags(), &IsFast) ||
20405-
!IsFast)
20406-
return SDValue();
20407-
20408-
SDValue NewPtr =
20409-
DAG.getMemBasePlusOffset(Ptr, TypeSize::getFixed(PtrOff), SDLoc(LD));
20410-
SDValue NewLD =
20411-
DAG.getLoad(NewVT, SDLoc(N0), LD->getChain(), NewPtr,
20412-
LD->getPointerInfo().getWithOffset(PtrOff), NewAlign,
20413-
LD->getMemOperand()->getFlags(), LD->getAAInfo());
20414-
SDValue NewVal = DAG.getNode(Opc, SDLoc(Value), NewVT, NewLD,
20415-
DAG.getConstant(NewImm, SDLoc(Value),
20416-
NewVT));
20417-
SDValue NewST =
20418-
DAG.getStore(Chain, SDLoc(N), NewVal, NewPtr,
20419-
ST->getPointerInfo().getWithOffset(PtrOff), NewAlign);
20420-
20421-
AddToWorklist(NewPtr.getNode());
20422-
AddToWorklist(NewLD.getNode());
20423-
AddToWorklist(NewVal.getNode());
20424-
WorklistRemover DeadNodes(*this);
20425-
DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), NewLD.getValue(1));
20426-
++OpsNarrowed;
20427-
return NewST;
20401+
if (TLI.allowsMemoryAccess(*DAG.getContext(), DAG.getDataLayout(), NewVT,
20402+
LD->getAddressSpace(), NewAlign,
20403+
LD->getMemOperand()->getFlags(), &IsFast) &&
20404+
IsFast)
20405+
break;
2042820406
}
20407+
// If loop above did not find any accepted ShAmt we need to exit here.
20408+
if (ShAmt + NewBW > VTStoreSize)
20409+
return SDValue();
20410+
20411+
APInt NewImm = Imm.lshr(ShAmt).trunc(NewBW);
20412+
if (Opc == ISD::AND)
20413+
NewImm ^= APInt::getAllOnes(NewBW);
20414+
Align NewAlign = commonAlignment(LD->getAlign(), PtrOff);
20415+
SDValue NewPtr =
20416+
DAG.getMemBasePlusOffset(Ptr, TypeSize::getFixed(PtrOff), SDLoc(LD));
20417+
SDValue NewLD =
20418+
DAG.getLoad(NewVT, SDLoc(N0), LD->getChain(), NewPtr,
20419+
LD->getPointerInfo().getWithOffset(PtrOff), NewAlign,
20420+
LD->getMemOperand()->getFlags(), LD->getAAInfo());
20421+
SDValue NewVal = DAG.getNode(Opc, SDLoc(Value), NewVT, NewLD,
20422+
DAG.getConstant(NewImm, SDLoc(Value), NewVT));
20423+
SDValue NewST =
20424+
DAG.getStore(Chain, SDLoc(N), NewVal, NewPtr,
20425+
ST->getPointerInfo().getWithOffset(PtrOff), NewAlign);
20426+
20427+
AddToWorklist(NewPtr.getNode());
20428+
AddToWorklist(NewLD.getNode());
20429+
AddToWorklist(NewVal.getNode());
20430+
WorklistRemover DeadNodes(*this);
20431+
DAG.ReplaceAllUsesOfValueWith(N0.getValue(1), NewLD.getValue(1));
20432+
++OpsNarrowed;
20433+
return NewST;
2042920434
}
2043020435

2043120436
return SDValue();

llvm/test/CodeGen/ARM/dagcombine-ld-op-st.ll

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -32,10 +32,10 @@ define i16 @test(ptr %p1) {
3232
;
3333
; CHECK-LE-NARROW-LABEL: test:
3434
; CHECK-LE-NARROW: @ %bb.0: @ %entry
35-
; CHECK-LE-NARROW-NEXT: ldrh r1, [r0, #8]
36-
; CHECK-LE-NARROW-NEXT: movw r2, #65534
37-
; CHECK-LE-NARROW-NEXT: orr r1, r1, r2
38-
; CHECK-LE-NARROW-NEXT: strh r1, [r0, #8]
35+
; CHECK-LE-NARROW-NEXT: ldr r1, [r0, #6]
36+
; CHECK-LE-NARROW-NEXT: orr r1, r1, #16646144
37+
; CHECK-LE-NARROW-NEXT: orr r1, r1, #-16777216
38+
; CHECK-LE-NARROW-NEXT: str r1, [r0, #6]
3939
; CHECK-LE-NARROW-NEXT: mov r0, #0
4040
; CHECK-LE-NARROW-NEXT: bx lr
4141
;
@@ -50,10 +50,10 @@ define i16 @test(ptr %p1) {
5050
;
5151
; CHECK-BE-NARROW-LABEL: test:
5252
; CHECK-BE-NARROW: @ %bb.0: @ %entry
53-
; CHECK-BE-NARROW-NEXT: ldrh r1, [r0]
54-
; CHECK-BE-NARROW-NEXT: movw r2, #65534
55-
; CHECK-BE-NARROW-NEXT: orr r1, r1, r2
56-
; CHECK-BE-NARROW-NEXT: strh r1, [r0]
53+
; CHECK-BE-NARROW-NEXT: ldr r1, [r0]
54+
; CHECK-BE-NARROW-NEXT: orr r1, r1, #16646144
55+
; CHECK-BE-NARROW-NEXT: orr r1, r1, #-16777216
56+
; CHECK-BE-NARROW-NEXT: str r1, [r0]
5757
; CHECK-BE-NARROW-NEXT: mov r0, #0
5858
; CHECK-BE-NARROW-NEXT: bx lr
5959
entry:

llvm/test/CodeGen/X86/apx/or.ll

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -906,13 +906,13 @@ entry:
906906
define void @or64mi_legacy(ptr %a) {
907907
; CHECK-LABEL: or64mi_legacy:
908908
; CHECK: # %bb.0: # %entry
909-
; CHECK-NEXT: orq $123456, (%rdi) # encoding: [0x48,0x81,0x0f,0x40,0xe2,0x01,0x00]
909+
; CHECK-NEXT: orl $123456, (%rdi) # encoding: [0x81,0x0f,0x40,0xe2,0x01,0x00]
910910
; CHECK-NEXT: # imm = 0x1E240
911911
; CHECK-NEXT: retq # encoding: [0xc3]
912912
;
913913
; NF-LABEL: or64mi_legacy:
914914
; NF: # %bb.0: # %entry
915-
; NF-NEXT: orq $123456, (%rdi) # encoding: [0x48,0x81,0x0f,0x40,0xe2,0x01,0x00]
915+
; NF-NEXT: orl $123456, (%rdi) # encoding: [0x81,0x0f,0x40,0xe2,0x01,0x00]
916916
; NF-NEXT: # imm = 0x1E240
917917
; NF-NEXT: retq # encoding: [0xc3]
918918
entry:

llvm/test/CodeGen/X86/illegal-bitfield-loadstore.ll

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,12 @@ define void @i24_or(ptr %a) {
66
; X86-LABEL: i24_or:
77
; X86: # %bb.0:
88
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
9-
; X86-NEXT: movzwl (%eax), %ecx
10-
; X86-NEXT: movzbl 2(%eax), %edx
11-
; X86-NEXT: shll $16, %edx
12-
; X86-NEXT: orl %ecx, %edx
13-
; X86-NEXT: orl $384, %edx # imm = 0x180
14-
; X86-NEXT: movw %dx, (%eax)
9+
; X86-NEXT: orw $384, (%eax) # imm = 0x180
1510
; X86-NEXT: retl
1611
;
1712
; X64-LABEL: i24_or:
1813
; X64: # %bb.0:
19-
; X64-NEXT: movzwl (%rdi), %eax
20-
; X64-NEXT: movzbl 2(%rdi), %ecx
21-
; X64-NEXT: shll $16, %ecx
22-
; X64-NEXT: orl %eax, %ecx
23-
; X64-NEXT: orl $384, %ecx # imm = 0x180
24-
; X64-NEXT: movw %cx, (%rdi)
14+
; X64-NEXT: orw $384, (%rdi) # imm = 0x180
2515
; X64-NEXT: retq
2616
%aa = load i24, ptr %a, align 1
2717
%b = or i24 %aa, 384
@@ -103,12 +93,12 @@ define void @i56_or(ptr %a) {
10393
; X86-LABEL: i56_or:
10494
; X86: # %bb.0:
10595
; X86-NEXT: movl {{[0-9]+}}(%esp), %eax
106-
; X86-NEXT: orl $384, (%eax) # imm = 0x180
96+
; X86-NEXT: orw $384, (%eax) # imm = 0x180
10797
; X86-NEXT: retl
10898
;
10999
; X64-LABEL: i56_or:
110100
; X64: # %bb.0:
111-
; X64-NEXT: orl $384, (%rdi) # imm = 0x180
101+
; X64-NEXT: orw $384, (%rdi) # imm = 0x180
112102
; X64-NEXT: retq
113103
%aa = load i56, ptr %a, align 1
114104
%b = or i56 %aa, 384

llvm/test/CodeGen/X86/pr35763.ll

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,7 @@ define dso_local void @PR35763() {
1414
; CHECK-NEXT: movzwl z+2(%rip), %ecx
1515
; CHECK-NEXT: orl %eax, %ecx
1616
; CHECK-NEXT: movq %rcx, tf_3_var_136(%rip)
17-
; CHECK-NEXT: movl z+6(%rip), %eax
18-
; CHECK-NEXT: movzbl z+10(%rip), %ecx
19-
; CHECK-NEXT: shlq $32, %rcx
20-
; CHECK-NEXT: orq %rax, %rcx
21-
; CHECK-NEXT: movabsq $1090921758719, %rax # imm = 0xFE0000FFFF
22-
; CHECK-NEXT: andq %rcx, %rax
23-
; CHECK-NEXT: movl %eax, z+6(%rip)
24-
; CHECK-NEXT: shrq $32, %rax
25-
; CHECK-NEXT: movb %al, z+10(%rip)
17+
; CHECK-NEXT: andl $-33554177, z+7(%rip) # imm = 0xFE0000FF
2618
; CHECK-NEXT: retq
2719
entry:
2820
%0 = load i16, ptr @z, align 8

llvm/test/CodeGen/X86/store_op_load_fold.ll

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -23,10 +23,7 @@ define void @test2() nounwind uwtable ssp {
2323
; CHECK-LABEL: test2:
2424
; CHECK: ## %bb.0:
2525
; CHECK-NEXT: movl L_s2$non_lazy_ptr, %eax
26-
; CHECK-NEXT: movzbl 22(%eax), %ecx
27-
; CHECK-NEXT: andl $-4, %ecx
28-
; CHECK-NEXT: movb %cl, 22(%eax)
29-
; CHECK-NEXT: movw $0, 20(%eax)
26+
; CHECK-NEXT: andl $-67108609, 19(%eax) ## imm = 0xFC0000FF
3027
; CHECK-NEXT: retl
3128
%bf.load35 = load i56, ptr getelementptr inbounds (%struct.S2, ptr @s2, i32 0, i32 5), align 16
3229
%bf.clear36 = and i56 %bf.load35, -1125895611875329

0 commit comments

Comments
 (0)