Skip to content

Commit 9907b97

Browse files
committed
[DAGCombiner] Fix check for extending loads
Fix a check for extending loads in DAGCombiner, where if the result type has more bits than the loaded type it should count as an extending load. All backends apart from AArch64 ignore this ExtTy argument to shouldReduceLoadWidth, so this change currently only impacts AArch64.
1 parent c5f82f7 commit 9907b97

File tree

2 files changed

+30
-1
lines changed

2 files changed

+30
-1
lines changed

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22566,7 +22566,7 @@ SDValue DAGCombiner::scalarizeExtractedVectorLoad(SDNode *EVE, EVT InVecVT,
2256622566
return SDValue();
2256722567

2256822568
ISD::LoadExtType ExtTy =
22569-
ResultVT.bitsGT(VecEltVT) ? ISD::NON_EXTLOAD : ISD::EXTLOAD;
22569+
ResultVT.bitsGT(VecEltVT) ? ISD::EXTLOAD : ISD::NON_EXTLOAD;
2257022570
if (!TLI.isOperationLegalOrCustom(ISD::LOAD, VecEltVT) ||
2257122571
!TLI.shouldReduceLoadWidth(OriginalLoad, ExtTy, VecEltVT))
2257222572
return SDValue();
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
; RUN: llc -mtriple=aarch64-unknown-linux-gnu < %s | FileCheck %s
2+
3+
; FIXME: Currently, we avoid narrowing this v4i32 load, in the
4+
; hopes of being able to fold the shift, despite it requiring stack
5+
; storage + loads. Ideally, we should narrow here and load the i32
6+
; directly from the variable offset e.g:
7+
;
8+
; add x8, x0, x1, lsl #4
9+
; and x9, x2, #0x3
10+
; ldr w0, [x8, x9, lsl #2]
11+
;
12+
; The AArch64TargetLowering::shouldReduceLoadWidth heuristic should
13+
; probably be updated to choose load-narrowing instead of folding the
14+
; lsl in larger vector cases.
15+
;
16+
; CHECK-LABEL: narrow_load_v4_i32_single_ele_variable_idx:
17+
; CHECK: sub sp, sp, #16
18+
; CHECK: ldr q[[REG0:[0-9]+]], [x0, x1, lsl #4]
19+
; CHECK: bfi x[[REG1:[0-9]+]], x2, #2, #2
20+
; CHECK: str q[[REG0]], [sp]
21+
; CHECK: ldr w0, [x[[REG1]]]
22+
; CHECK: add sp, sp, #16
23+
define i32 @narrow_load_v4_i32_single_ele_variable_idx(ptr %ptr, i64 %off, i32 %ele) {
24+
entry:
25+
%idx = getelementptr inbounds <4 x i32>, ptr %ptr, i64 %off
26+
%x = load <4 x i32>, ptr %idx, align 8
27+
%res = extractelement <4 x i32> %x, i32 %ele
28+
ret i32 %res
29+
}

0 commit comments

Comments
 (0)