Skip to content

Commit c161775

Browse files
committed
[FastISel] Flush local value map on every instruction
Local values are constants or addresses that can't be folded into the instruction that uses them. FastISel materializes these in a "local value" area that always dominates the current insertion point, to try to avoid materializing these values more than once (per block). https://reviews.llvm.org/D43093 added code to sink these local value instructions to their first use, which has two beneficial effects. One, it is likely to avoid some unnecessary spills and reloads; two, it allows us to attach the debug location of the user to the local value instruction. The latter effect can improve the debugging experience for debuggers with a "set next statement" feature, such as the Visual Studio debugger and PS4 debugger, because instructions to set up constants for a given statement will be associated with the appropriate source line. There are also some constants (primarily addresses) that could be produced by no-op casts or GEP instructions; the main difference from "local value" instructions is that these are values from separate IR instructions, and therefore could have multiple users across multiple basic blocks. D43093 avoided sinking these, even though they were emitted to the same "local value" area as the other instructions. The patch comment for D43093 states: Local values may also be used by no-op casts, which adds the register to the RegFixups table. Without reversing the RegFixups map direction, we don't have enough information to sink these instructions. This patch undoes most of D43093, and instead flushes the local value map after(*) every IR instruction, using that instruction's debug location. This avoids sometimes incorrect locations used previously, and emits instructions in a more natural order. In addition, constants materialized due to PHI instructions are not assigned a debug location immediately; instead, when the local value map is flushed, if the first local value instruction has no debug location, it is given the same location as the first non-local-value-map instruction. This prevents PHIs from introducing unattributed instructions, which would either be implicitly attributed to the location for the preceding IR instruction, or given line 0 if they are at the beginning of a machine basic block. Neither of those consequences is good for debugging. This does mean materialized values are not re-used across IR instruction boundaries; however, only about 5% of those values were reused in an experimental self-build of clang. (*) Actually, just prior to the next instruction. It seems like it would be cleaner the other way, but I was having trouble getting that to work. This reapplies commits cf1c774 and dc35368, and adds the modification to PHI handling, which should avoid problems with debugging under gdb. Differential Revision: https://reviews.llvm.org/D91734
1 parent e5eb5c8 commit c161775

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+487
-479
lines changed

lld/test/wasm/debug-removed-fn.ll

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
; RUN: llvm-dwarfdump -debug-line -debug-ranges %t.wasm | FileCheck %s
44

55
; CHECK: Address
6-
; CHECK: 0x0000000000000005
6+
; CHECK: 0x0000000000000003
77
; CHECK-NEXT: 0x0000000000000006
88
; CHECK-EMPTY:
99

lldb/test/Shell/SymbolFile/NativePDB/disassembly.cpp

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,9 +28,9 @@ int main(int argc, char **argv) {
2828
// CHECK-NEXT: disassembly.cpp.tmp.exe[{{.*}}] <+17>: mov dword ptr [rsp + 0x24], ecx
2929
// CHECK: ** 15 foo();
3030
// CHECK: disassembly.cpp.tmp.exe[{{.*}}] <+21>: call {{.*}} ; foo at disassembly.cpp:12
31-
// CHECK-NEXT: disassembly.cpp.tmp.exe[{{.*}}] <+26>: xor eax, eax
3231
// CHECK: ** 16 return 0;
3332
// CHECK-NEXT: 17 }
3433
// CHECK-NEXT: 18
35-
// CHECK: disassembly.cpp.tmp.exe[{{.*}}] <+28>: add rsp, 0x38
34+
// CHECK: disassembly.cpp.tmp.exe[{{.*}}] <+26>: xor eax, eax
35+
// CHECK-NEXT: disassembly.cpp.tmp.exe[{{.*}}] <+28>: add rsp, 0x38
3636
// CHECK-NEXT: disassembly.cpp.tmp.exe[{{.*}}] <+32>: ret

lldb/test/Shell/SymbolFile/NativePDB/load-pdb.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ int main(int argc, char** argv) {
2222
// CHECK: (lldb) target symbols add {{.*}}bar.pdb
2323
// CHECK: symbol file '{{.*}}bar.pdb' has been added to '{{.*}}foo.exe'
2424
// CHECK: (lldb) b main
25-
// CHECK: Breakpoint 1: where = foo.exe`main + 23 at load-pdb.cpp:19, address = 0x0000000140001017
25+
// CHECK: Breakpoint 1: where = foo.exe`main + 21 at load-pdb.cpp:19, address = 0x0000000140001015
2626
// CHECK: (lldb) image dump symfile
2727
// CHECK: Types:
2828
// CHECK: {{.*}}: Type{0x00010024} , size = 0, compiler_type = {{.*}} int (int, char **)

llvm/include/llvm/CodeGen/FastISel.h

Lines changed: 2 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -246,7 +246,7 @@ class FastISel {
246246
/// be appended.
247247
void startNewBlock();
248248

249-
/// Flush the local value map and sink local values if possible.
249+
/// Flush the local value map.
250250
void finishBasicBlock();
251251

252252
/// Return current debug location information.
@@ -313,10 +313,7 @@ class FastISel {
313313
void removeDeadCode(MachineBasicBlock::iterator I,
314314
MachineBasicBlock::iterator E);
315315

316-
struct SavePoint {
317-
MachineBasicBlock::iterator InsertPt;
318-
DebugLoc DL;
319-
};
316+
using SavePoint = MachineBasicBlock::iterator;
320317

321318
/// Prepare InsertPt to begin inserting instructions into the local
322319
/// value area and return the old insert position.
@@ -559,20 +556,6 @@ class FastISel {
559556
/// Removes dead local value instructions after SavedLastLocalvalue.
560557
void removeDeadLocalValueCode(MachineInstr *SavedLastLocalValue);
561558

562-
struct InstOrderMap {
563-
DenseMap<MachineInstr *, unsigned> Orders;
564-
MachineInstr *FirstTerminator = nullptr;
565-
unsigned FirstTerminatorOrder = std::numeric_limits<unsigned>::max();
566-
567-
void initialize(MachineBasicBlock *MBB,
568-
MachineBasicBlock::iterator LastFlushPoint);
569-
};
570-
571-
/// Sinks the local value materialization instruction LocalMI to its first use
572-
/// in the basic block, or deletes it if it is not used.
573-
void sinkLocalValueMaterialization(MachineInstr &LocalMI, Register DefReg,
574-
InstOrderMap &OrderMap);
575-
576559
/// Insertion point before trying to select the current instruction.
577560
MachineBasicBlock::iterator SavedInsertPt;
578561

llvm/lib/CodeGen/SelectionDAG/FastISel.cpp

Lines changed: 57 additions & 153 deletions
Original file line numberDiff line numberDiff line change
@@ -139,7 +139,6 @@ void FastISel::startNewBlock() {
139139
LastLocalValue = EmitStartPt;
140140
}
141141

142-
/// Flush the local CSE map and sink anything we can.
143142
void FastISel::finishBasicBlock() { flushLocalValueMap(); }
144143

145144
bool FastISel::lowerArguments() {
@@ -164,48 +163,77 @@ bool FastISel::lowerArguments() {
164163

165164
/// Return the defined register if this instruction defines exactly one
166165
/// virtual register and uses no other virtual registers. Otherwise return 0.
167-
static Register findSinkableLocalRegDef(MachineInstr &MI) {
166+
static Register findLocalRegDef(MachineInstr &MI) {
168167
Register RegDef;
169168
for (const MachineOperand &MO : MI.operands()) {
170169
if (!MO.isReg())
171170
continue;
172171
if (MO.isDef()) {
173172
if (RegDef)
174-
return 0;
173+
return Register();
175174
RegDef = MO.getReg();
176175
} else if (MO.getReg().isVirtual()) {
177-
// This is another use of a vreg. Don't try to sink it.
176+
// This is another use of a vreg. Don't delete it.
178177
return Register();
179178
}
180179
}
181180
return RegDef;
182181
}
183182

183+
static bool isRegUsedByPhiNodes(Register DefReg,
184+
FunctionLoweringInfo &FuncInfo) {
185+
for (auto &P : FuncInfo.PHINodesToUpdate)
186+
if (P.second == DefReg)
187+
return true;
188+
return false;
189+
}
190+
184191
void FastISel::flushLocalValueMap() {
185-
// Try to sink local values down to their first use so that we can give them a
186-
// better debug location. This has the side effect of shrinking local value
187-
// live ranges, which helps out fast regalloc.
188-
if (SinkLocalValues && LastLocalValue != EmitStartPt) {
189-
// Sink local value materialization instructions between EmitStartPt and
190-
// LastLocalValue. Visit them bottom-up, starting from LastLocalValue, to
191-
// avoid inserting into the range that we're iterating over.
192+
// If FastISel bails out, it could leave local value instructions behind
193+
// that aren't used for anything. Detect and erase those.
194+
if (LastLocalValue != EmitStartPt) {
195+
// Save the first instruction after local values, for later.
196+
MachineBasicBlock::iterator FirstNonValue(LastLocalValue);
197+
++FirstNonValue;
198+
192199
MachineBasicBlock::reverse_iterator RE =
193200
EmitStartPt ? MachineBasicBlock::reverse_iterator(EmitStartPt)
194201
: FuncInfo.MBB->rend();
195202
MachineBasicBlock::reverse_iterator RI(LastLocalValue);
196-
197-
InstOrderMap OrderMap;
198203
for (; RI != RE;) {
199204
MachineInstr &LocalMI = *RI;
205+
// Increment before erasing what it points to.
200206
++RI;
201-
bool Store = true;
202-
if (!LocalMI.isSafeToMove(nullptr, Store))
207+
Register DefReg = findLocalRegDef(LocalMI);
208+
if (!DefReg)
203209
continue;
204-
Register DefReg = findSinkableLocalRegDef(LocalMI);
205-
if (DefReg == 0)
210+
if (FuncInfo.RegsWithFixups.count(DefReg))
206211
continue;
212+
bool UsedByPHI = isRegUsedByPhiNodes(DefReg, FuncInfo);
213+
if (!UsedByPHI && MRI.use_nodbg_empty(DefReg)) {
214+
if (EmitStartPt == &LocalMI)
215+
EmitStartPt = EmitStartPt->getPrevNode();
216+
LLVM_DEBUG(dbgs() << "removing dead local value materialization"
217+
<< LocalMI);
218+
LocalMI.eraseFromParent();
219+
}
220+
}
207221

208-
sinkLocalValueMaterialization(LocalMI, DefReg, OrderMap);
222+
if (FirstNonValue != FuncInfo.MBB->end()) {
223+
// See if there are any local value instructions left. If so, we want to
224+
// make sure the first one has a debug location; if it doesn't, use the
225+
// first non-value instruction's debug location.
226+
227+
// If EmitStartPt is non-null, this block had copies at the top before
228+
// FastISel started doing anything; it points to the last one, so the
229+
// first local value instruction is the one after EmitStartPt.
230+
// If EmitStartPt is null, the first local value instruction is at the
231+
// top of the block.
232+
MachineBasicBlock::iterator FirstLocalValue =
233+
EmitStartPt ? ++MachineBasicBlock::iterator(EmitStartPt)
234+
: FuncInfo.MBB->begin();
235+
if (FirstLocalValue != FirstNonValue && !FirstLocalValue->getDebugLoc())
236+
FirstLocalValue->setDebugLoc(FirstNonValue->getDebugLoc());
209237
}
210238
}
211239

@@ -216,131 +244,6 @@ void FastISel::flushLocalValueMap() {
216244
LastFlushPoint = FuncInfo.InsertPt;
217245
}
218246

219-
static bool isRegUsedByPhiNodes(Register DefReg,
220-
FunctionLoweringInfo &FuncInfo) {
221-
for (auto &P : FuncInfo.PHINodesToUpdate)
222-
if (P.second == DefReg)
223-
return true;
224-
return false;
225-
}
226-
227-
static bool isTerminatingEHLabel(MachineBasicBlock *MBB, MachineInstr &MI) {
228-
// Ignore non-EH labels.
229-
if (!MI.isEHLabel())
230-
return false;
231-
232-
// Any EH label outside a landing pad must be for an invoke. Consider it a
233-
// terminator.
234-
if (!MBB->isEHPad())
235-
return true;
236-
237-
// If this is a landingpad, the first non-phi instruction will be an EH_LABEL.
238-
// Don't consider that label to be a terminator.
239-
return MI.getIterator() != MBB->getFirstNonPHI();
240-
}
241-
242-
/// Build a map of instruction orders. Return the first terminator and its
243-
/// order. Consider EH_LABEL instructions to be terminators as well, since local
244-
/// values for phis after invokes must be materialized before the call.
245-
void FastISel::InstOrderMap::initialize(
246-
MachineBasicBlock *MBB, MachineBasicBlock::iterator LastFlushPoint) {
247-
unsigned Order = 0;
248-
for (MachineInstr &I : *MBB) {
249-
if (!FirstTerminator &&
250-
(I.isTerminator() || isTerminatingEHLabel(MBB, I))) {
251-
FirstTerminator = &I;
252-
FirstTerminatorOrder = Order;
253-
}
254-
Orders[&I] = Order++;
255-
256-
// We don't need to order instructions past the last flush point.
257-
if (I.getIterator() == LastFlushPoint)
258-
break;
259-
}
260-
}
261-
262-
void FastISel::sinkLocalValueMaterialization(MachineInstr &LocalMI,
263-
Register DefReg,
264-
InstOrderMap &OrderMap) {
265-
// If this register is used by a register fixup, MRI will not contain all
266-
// the uses until after register fixups, so don't attempt to sink or DCE
267-
// this instruction. Register fixups typically come from no-op cast
268-
// instructions, which replace the cast instruction vreg with the local
269-
// value vreg.
270-
if (FuncInfo.RegsWithFixups.count(DefReg))
271-
return;
272-
273-
// We can DCE this instruction if there are no uses and it wasn't a
274-
// materialized for a successor PHI node.
275-
bool UsedByPHI = isRegUsedByPhiNodes(DefReg, FuncInfo);
276-
if (!UsedByPHI && MRI.use_nodbg_empty(DefReg)) {
277-
if (EmitStartPt == &LocalMI)
278-
EmitStartPt = EmitStartPt->getPrevNode();
279-
LLVM_DEBUG(dbgs() << "removing dead local value materialization "
280-
<< LocalMI);
281-
OrderMap.Orders.erase(&LocalMI);
282-
LocalMI.eraseFromParent();
283-
return;
284-
}
285-
286-
// Number the instructions if we haven't yet so we can efficiently find the
287-
// earliest use.
288-
if (OrderMap.Orders.empty())
289-
OrderMap.initialize(FuncInfo.MBB, LastFlushPoint);
290-
291-
// Find the first user in the BB.
292-
MachineInstr *FirstUser = nullptr;
293-
unsigned FirstOrder = std::numeric_limits<unsigned>::max();
294-
for (MachineInstr &UseInst : MRI.use_nodbg_instructions(DefReg)) {
295-
auto I = OrderMap.Orders.find(&UseInst);
296-
assert(I != OrderMap.Orders.end() &&
297-
"local value used by instruction outside local region");
298-
unsigned UseOrder = I->second;
299-
if (UseOrder < FirstOrder) {
300-
FirstOrder = UseOrder;
301-
FirstUser = &UseInst;
302-
}
303-
}
304-
305-
// The insertion point will be the first terminator or the first user,
306-
// whichever came first. If there was no terminator, this must be a
307-
// fallthrough block and the insertion point is the end of the block.
308-
MachineBasicBlock::instr_iterator SinkPos;
309-
if (UsedByPHI && OrderMap.FirstTerminatorOrder < FirstOrder) {
310-
FirstOrder = OrderMap.FirstTerminatorOrder;
311-
SinkPos = OrderMap.FirstTerminator->getIterator();
312-
} else if (FirstUser) {
313-
SinkPos = FirstUser->getIterator();
314-
} else {
315-
assert(UsedByPHI && "must be users if not used by a phi");
316-
SinkPos = FuncInfo.MBB->instr_end();
317-
}
318-
319-
// Collect all DBG_VALUEs before the new insertion position so that we can
320-
// sink them.
321-
SmallVector<MachineInstr *, 1> DbgValues;
322-
for (MachineInstr &DbgVal : MRI.use_instructions(DefReg)) {
323-
if (!DbgVal.isDebugValue())
324-
continue;
325-
unsigned UseOrder = OrderMap.Orders[&DbgVal];
326-
if (UseOrder < FirstOrder)
327-
DbgValues.push_back(&DbgVal);
328-
}
329-
330-
// Sink LocalMI before SinkPos and assign it the same DebugLoc.
331-
LLVM_DEBUG(dbgs() << "sinking local value to first use " << LocalMI);
332-
FuncInfo.MBB->remove(&LocalMI);
333-
FuncInfo.MBB->insert(SinkPos, &LocalMI);
334-
if (SinkPos != FuncInfo.MBB->end())
335-
LocalMI.setDebugLoc(SinkPos->getDebugLoc());
336-
337-
// Sink any debug values that we've collected.
338-
for (MachineInstr *DI : DbgValues) {
339-
FuncInfo.MBB->remove(DI);
340-
FuncInfo.MBB->insert(SinkPos, DI);
341-
}
342-
}
343-
344247
bool FastISel::hasTrivialKill(const Value *V) {
345248
// Don't consider constants or arguments to have trivial kills.
346249
const Instruction *I = dyn_cast<Instruction>(V);
@@ -578,21 +481,17 @@ void FastISel::removeDeadCode(MachineBasicBlock::iterator I,
578481
}
579482

580483
FastISel::SavePoint FastISel::enterLocalValueArea() {
581-
MachineBasicBlock::iterator OldInsertPt = FuncInfo.InsertPt;
582-
DebugLoc OldDL = DbgLoc;
484+
SavePoint OldInsertPt = FuncInfo.InsertPt;
583485
recomputeInsertPt();
584-
DbgLoc = DebugLoc();
585-
SavePoint SP = {OldInsertPt, OldDL};
586-
return SP;
486+
return OldInsertPt;
587487
}
588488

589489
void FastISel::leaveLocalValueArea(SavePoint OldInsertPt) {
590490
if (FuncInfo.InsertPt != FuncInfo.MBB->begin())
591491
LastLocalValue = &*std::prev(FuncInfo.InsertPt);
592492

593493
// Restore the previous insert position.
594-
FuncInfo.InsertPt = OldInsertPt.InsertPt;
595-
DbgLoc = OldInsertPt.DL;
494+
FuncInfo.InsertPt = OldInsertPt;
596495
}
597496

598497
bool FastISel::selectBinaryOp(const User *I, unsigned ISDOpcode) {
@@ -1657,6 +1556,11 @@ void FastISel::removeDeadLocalValueCode(MachineInstr *SavedLastLocalValue)
16571556
}
16581557

16591558
bool FastISel::selectInstruction(const Instruction *I) {
1559+
// Flush the local value map before starting each instruction.
1560+
// This improves locality and debugging, and can reduce spills.
1561+
// Reuse of values across IR instructions is relatively uncommon.
1562+
flushLocalValueMap();
1563+
16601564
MachineInstr *SavedLastLocalValue = getLastLocalValue();
16611565
// Just before the terminator instruction, insert instructions to
16621566
// feed PHI nodes in successor blocks.
@@ -2362,9 +2266,9 @@ bool FastISel::handlePHINodesInSuccessorBlocks(const BasicBlock *LLVMBB) {
23622266

23632267
const Value *PHIOp = PN.getIncomingValueForBlock(LLVMBB);
23642268

2365-
// Set the DebugLoc for the copy. Prefer the location of the operand
2366-
// if there is one; use the location of the PHI otherwise.
2367-
DbgLoc = PN.getDebugLoc();
2269+
// Set the DebugLoc for the copy. Use the location of the operand if
2270+
// there is one; otherwise no location, flushLocalValueMap will fix it.
2271+
DbgLoc = DebugLoc();
23682272
if (const auto *Inst = dyn_cast<Instruction>(PHIOp))
23692273
DbgLoc = Inst->getDebugLoc();
23702274

llvm/test/CodeGen/AArch64/arm64-abi_align.ll

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -456,11 +456,11 @@ entry:
456456
; FAST: str {{x[0-9]+}}, [sp, #40]
457457
; FAST: str {{x[0-9]+}}, [sp, #48]
458458
; FAST: str {{x[0-9]+}}, [sp, #56]
459-
; FAST: str {{w[0-9]+}}, [sp]
460459
; Address of s1 is passed on stack at sp+8
461460
; FAST: sub x[[A:[0-9]+]], x29, #32
462-
; FAST: str x[[A]], [sp, #8]
463461
; FAST: add x[[B:[0-9]+]], sp, #32
462+
; FAST: str {{w[0-9]+}}, [sp]
463+
; FAST: str x[[A]], [sp, #8]
464464
; FAST: str x[[B]], [sp, #16]
465465
%tmp = alloca %struct.s43, align 16
466466
%tmp1 = alloca %struct.s43, align 16

llvm/test/CodeGen/AArch64/arm64-fast-isel-call.ll

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -82,12 +82,12 @@ entry:
8282
; CHECK: mov x0, xzr
8383
; CHECK: mov w1, #-8
8484
; CHECK: mov [[REG2:w[0-9]+]], #1023
85-
; CHECK: uxth w2, [[REG2]]
8685
; CHECK: mov [[REG3:w[0-9]+]], #2
87-
; CHECK: sxtb w3, [[REG3]]
8886
; CHECK: mov [[REG4:w[0-9]+]], wzr
89-
; CHECK: and w4, [[REG4]], #0x1
9087
; CHECK: mov [[REG5:w[0-9]+]], #1
88+
; CHECK: uxth w2, [[REG2]]
89+
; CHECK: sxtb w3, [[REG3]]
90+
; CHECK: and w4, [[REG4]], #0x1
9191
; CHECK: and w5, [[REG5]], #0x1
9292
; CHECK: bl _func2
9393
%call = call i32 @func2(i64 zeroext 0, i32 signext -8, i16 zeroext 1023, i8 signext -254, i1 zeroext 0, i1 zeroext 1)

llvm/test/CodeGen/AArch64/arm64-fast-isel-gv.ll

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,11 @@ entry:
2424
; CHECK: mov [[REG3:x[0-9]+]], #13849
2525
; CHECK: add [[REG7:x[0-9]+]], [[REG6]], [[REG3]]
2626
; CHECK: and [[REG8:x[0-9]+]], [[REG7]], #0xffff
27+
; CHECK: adrp [[REG1:x[0-9]+]], _seed@GOTPAGE
28+
; CHECK: ldr [[REG1]], {{\[}}[[REG1]], _seed@GOTPAGEOFF{{\]}}
2729
; CHECK: str [[REG8]], {{\[}}[[REG1]]{{\]}}
30+
; CHECK: adrp [[REG1:x[0-9]+]], _seed@GOTPAGE
31+
; CHECK: ldr [[REG1]], {{\[}}[[REG1]], _seed@GOTPAGEOFF{{\]}}
2832
; CHECK: ldr {{x[0-9]+}}, {{\[}}[[REG1]]{{\]}}
2933
%0 = load i64, i64* @seed, align 8
3034
%mul = mul nsw i64 %0, 1309

0 commit comments

Comments
 (0)