-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[DebugInfo] Don't set prologue_end behind line-zero call insts #156850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@llvm/pr-subscribers-debuginfo Author: Jeremy Morse (jmorse) ChangesIn functions that have been seriously deformed during optimisation, there can be call instructions with line-zero immediately after frame setup (see C reproducer in the test added). Our previous algorithms for prologue_end ignored these, meaning someone entering a function at prologue_end would break-in after a function call had completed. Prefer instead to not emit prologue_end at all: there is no good place to put it. Full diff: https://github.com/llvm/llvm-project/pull/156850.diff 3 Files Affected:
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index c27f100775625..0208167fac6f7 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -2291,6 +2291,15 @@ findPrologueEndLoc(const MachineFunction *MF) {
return *FoundInst;
}
+ // We choose to ignore line-zero locations when setting the prologue as they
+ // can't be stepped on anyway; however in very rare scenarios function calls
+ // can have line zero, and we shouldn't step over those. In these
+ // extraordinary conditions, just bail out and refuse to set a prologue_end.
+ if (CurInst->isCall())
+ if (const DILocation *Loc = CurInst->getDebugLoc().get())
+ if (Loc->getLine() == 0)
+ return std::make_pair(nullptr, true);
+
// Try to continue searching, but use a backup-location if substantive
// computation is happening.
auto NextInst = std::next(CurInst);
diff --git a/llvm/test/DebugInfo/MIR/X86/debug-loc-0.mir b/llvm/test/DebugInfo/MIR/X86/debug-loc-0.mir
index 01862f5905f9c..71489d5a5e485 100644
--- a/llvm/test/DebugInfo/MIR/X86/debug-loc-0.mir
+++ b/llvm/test/DebugInfo/MIR/X86/debug-loc-0.mir
@@ -5,7 +5,7 @@
# CHECK: Ltmp0:
# CHECK: .loc 1 0 0
# CHECK-NOT: .loc 1 0 0
-# CHECK: .loc 1 37 1 prologue_end
+# CHECK: .loc 1 37 1
--- |
; ModuleID = '<stdin>'
diff --git a/llvm/test/DebugInfo/X86/no-prologue-end-after-line0-calls.mir b/llvm/test/DebugInfo/X86/no-prologue-end-after-line0-calls.mir
new file mode 100644
index 0000000000000..604b9d1d2eafb
--- /dev/null
+++ b/llvm/test/DebugInfo/X86/no-prologue-end-after-line0-calls.mir
@@ -0,0 +1,129 @@
+# RUN: llc %s -start-after=livedebugvalues -o - -filetype=obj | llvm-dwarfdump - --debug-line | FileCheck %s --implicit-check-not=prologue_end
+#
+## Original code, compiled clang -O2 -g -c
+##
+## void ext();
+## int main(int argc, char **argv) {
+## if (argc == 1)
+## ext();
+## else
+## ext();
+## return 0;
+## }
+##
+## In the code sequence above, the call to ext is given line zero during
+## optimisation, because the code is duplicated down all function paths thus
+## gets merged. We get something like this as the output:
+##
+## 0: 50 push %rax
+## 1: 31 c0 xor %eax,%eax
+## 3: e8 00 00 00 00 call 8 <main+0x8>
+## 4: R_X86_64_PLT32 ext-0x4
+## 8: 31 c0 xor %eax,%eax
+## a: 59 pop %rcx
+## b: c3 ret
+##
+## Where prologue_end is placed on address 8, the clearing of the return
+## register, because it's the first "real" instruction that isn't line zero.
+## This then causes debuggers to skip over the call instruction when entering
+## the function, which is catastrophic.
+##
+## Instead: we shouldn't put a prologue_end on this function at all. It's too
+## deformed from the original code to truly have a position (with a line number)
+## that is both true, and after frame setup. This gives comsumers the
+## opportunity to recognise "this is a crazy function" and act accordingly.
+##
+## Check lines ensure that there's something meaningful in the line table
+## involving line 2, the implicit-check-not is making sure there isn't a
+## prologue_end flag on any line entry.
+#
+# CHECK: standard_opcode_lengths[DW_LNS_set_prologue_end] = 0
+#
+# CHECK: Address Line Column File
+# CHECK: 2 0 0
+# CHECK: end_sequence
+--- |
+ ; ModuleID = '/tmp/test.c'
+ source_filename = "/tmp/test.c"
+ target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128"
+ target triple = "x86_64-unknown-linux-gnu"
+
+ ; Function Attrs: nounwind uwtable
+ define dso_local noundef i32 @main(i32 noundef %argc, ptr noundef readnone captures(none) %argv) local_unnamed_addr !dbg !10 {
+ entry:
+ #dbg_value(i32 %argc, !19, !DIExpression(), !21)
+ #dbg_value(ptr %argv, !20, !DIExpression(), !21)
+ tail call void (...) @ext(), !dbg !22
+ ret i32 0, !dbg !24
+ }
+
+ declare !dbg !25 void @ext(...) local_unnamed_addr
+
+ !llvm.dbg.cu = !{!0}
+ !llvm.module.flags = !{!2, !3, !4, !5, !6, !7, !8}
+ !llvm.ident = !{!9}
+
+ !0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 22.0.0git (/fast/fs/llvm4 8989ec5439dc2df2aeb7e5ea3e6c255ce8e9634d)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None)
+ !1 = !DIFile(filename: "/tmp/test.c", directory: "/fast/fs/llvm-stage/debug", checksumkind: CSK_MD5, checksum: "9862df54ae1fdd9354308eae69de364a")
+ !2 = !{i32 7, !"Dwarf Version", i32 5}
+ !3 = !{i32 2, !"Debug Info Version", i32 3}
+ !4 = !{i32 1, !"wchar_size", i32 4}
+ !5 = !{i32 8, !"PIC Level", i32 2}
+ !6 = !{i32 7, !"PIE Level", i32 2}
+ !7 = !{i32 7, !"uwtable", i32 2}
+ !8 = !{i32 7, !"debug-info-assignment-tracking", i1 true}
+ !9 = !{!"clang version 22.0.0git (/fast/fs/llvm4 8989ec5439dc2df2aeb7e5ea3e6c255ce8e9634d)"}
+ !10 = distinct !DISubprogram(name: "main", scope: !11, file: !11, line: 2, type: !12, scopeLine: 2, flags: DIFlagPrototyped | DIFlagAllCallsDescribed, spFlags: DISPFlagDefinition | DISPFlagOptimized, unit: !0, retainedNodes: !18, keyInstructions: true)
+ !11 = !DIFile(filename: "/tmp/test.c", directory: "", checksumkind: CSK_MD5, checksum: "9862df54ae1fdd9354308eae69de364a")
+ !12 = !DISubroutineType(types: !13)
+ !13 = !{!14, !14, !15}
+ !14 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
+ !15 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !16, size: 64)
+ !16 = !DIDerivedType(tag: DW_TAG_pointer_type, baseType: !17, size: 64)
+ !17 = !DIBasicType(name: "char", size: 8, encoding: DW_ATE_signed_char)
+ !18 = !{!19, !20}
+ !19 = !DILocalVariable(name: "argc", arg: 1, scope: !10, file: !11, line: 2, type: !14)
+ !20 = !DILocalVariable(name: "argv", arg: 2, scope: !10, file: !11, line: 2, type: !15)
+ !21 = !DILocation(line: 0, scope: !10)
+ !22 = !DILocation(line: 0, scope: !23)
+ !23 = distinct !DILexicalBlock(scope: !10, file: !11, line: 3, column: 7)
+ !24 = !DILocation(line: 7, column: 4, scope: !10, atomGroup: 2, atomRank: 1)
+ !25 = !DISubprogram(name: "ext", scope: !11, file: !11, line: 1, type: !26, spFlags: DISPFlagOptimized)
+ !26 = !DISubroutineType(types: !27)
+ !27 = !{null}
+...
+---
+name: main
+alignment: 16
+tracksRegLiveness: true
+noPhis: true
+isSSA: false
+noVRegs: true
+hasFakeUses: false
+debugInstrRef: true
+tracksDebugUserValues: true
+frameInfo:
+ stackSize: 8
+ offsetAdjustment: -8
+ maxAlignment: 1
+ adjustsStack: true
+ hasCalls: true
+ maxCallFrameSize: 0
+ isCalleeSavedInfoValid: true
+machineFunctionInfo:
+ amxProgModel: None
+body: |
+ bb.0.entry:
+ DBG_VALUE $edi, $noreg, !19, !DIExpression(), debug-location !21
+ DBG_VALUE $rsi, $noreg, !20, !DIExpression(), debug-location !21
+ frame-setup PUSH64r undef $rax, implicit-def $rsp, implicit $rsp
+ frame-setup CFI_INSTRUCTION def_cfa_offset 16
+ dead $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, implicit-def $al, debug-location !22
+ CALL64pcrel32 target-flags(x86-plt) @ext, csr_64, implicit $rsp, implicit $ssp, implicit killed $al, implicit-def $rsp, implicit-def $ssp, debug-location !22
+ DBG_VALUE $rsi, $noreg, !20, !DIExpression(DW_OP_LLVM_entry_value, 1), debug-location !21
+ DBG_VALUE $edi, $noreg, !19, !DIExpression(DW_OP_LLVM_entry_value, 1), debug-location !21
+ $eax = XOR32rr undef $eax, undef $eax, implicit-def dead $eflags, debug-location !24
+ $rcx = frame-destroy POP64r implicit-def $rsp, implicit $rsp, debug-location !24
+ frame-destroy CFI_INSTRUCTION def_cfa_offset 8, debug-location !24
+ RET64 $eax, debug-location !24
+...
|
SLTozer
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle SGTM, some minor nits and otherwise suggest a brief wait to see if anyone else has comments.
| ## 0: 50 push %rax | ||
| ## 1: 31 c0 xor %eax,%eax | ||
| ## 3: e8 00 00 00 00 call 8 <main+0x8> | ||
| ## 4: R_X86_64_PLT32 ext-0x4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## 4: R_X86_64_PLT32 ext-0x4 | |
| ## 4: R_X86_64_PLT32 ext-0x4 |
| ## | ||
| ## Instead: we shouldn't put a prologue_end on this function at all. It's too | ||
| ## deformed from the original code to truly have a position (with a line number) | ||
| ## that is both true, and after frame setup. This gives comsumers the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ## that is both true, and after frame setup. This gives comsumers the | |
| ## that is both true, and after frame setup. This gives consumers the |
| !llvm.module.flags = !{!2, !3, !4, !5, !6, !7, !8} | ||
| !llvm.ident = !{!9} | ||
|
|
||
| !0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 22.0.0git (/fast/fs/llvm4 8989ec5439dc2df2aeb7e5ea3e6c255ce8e9634d)", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, splitDebugInlining: false, nameTableKind: None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can clean up the test strings a bit, removing git hashes and checksums.
|
The issue with putting prologue_end before such a call is that the frame might not be fully set up, so our variable locations might not be valid yet? Or some other issue? (& if our stack-frame based variable locations aren't valid yet - having no prologue_end might be as-bad-or-worse, right? Since you'll break before the very first instruction - so they'll definitely not be valid then?) & might be wroth knowing/discussing a bit a specific example of this problem to see what the tradeoffs are? Do you have a particular optimization/sequence of optimizations that produces an especially bad case? |
|
(Still working through my email backlog sorry),
Mostly that the frame usually won't be fully set up. In the added test case we can choose between:
Where the push instruction is actually an OK choice in this specific case. However, I think for a majority of functions the last frame-setup instruction will be
Indeed there are only poor choices in this situation -- I figure not setting prologue_end signals to the consumer that there's no good position at all (although I don't think this is a documented convention), which gives the consumer an opportunity to make decisions about it. For the sce debugger, with no prologue_end I believe we break on the first instruction in the function, and don't show any variable locations until stepping begins (85% confidence).
Happily it's an extremely rare scenario -- some QA on a mid-to-large-sized C++ codebase found this problem in a single function out of thousands, and it was broadly what's in the test case code: Where the branch-folder pass, running extremely late (post-regalloc), identifies the code sequence on either side of the branch as being identical and hoists the calls, merging the source location into line zero. The original codebase had a much complicated tail. At entry it produced this assembly: Where with todays clang/llvm we place prologue_end after the call instruction, meaning a breakpoint on the function enters after the call completes. Placing it on I suppose we can determine whether there are any stack variables at risk by examining the Most of my motivation for fixing this (even though it's rare) is that, IMHO, stopping at the wrong point in the program is massively worse that displaying some uninitialized variables. Developers can usually spot the latter, but will be dismayed if they can't break into the right place. |
|
Thanks for all the details - given the rarity, if you just wan to move forward with this as-is, I'm OK with it. But for argument's sake: What about if we forced the location after the prologue, if it's line zero, to instead have the function's scope start location. This would mean if you broke inside the function call - you'd see the caller frame as being at the start of the calling function, not line zero, which might be misleading - since that line doesn't necessarily have a function call on it. But maybe isn't /too/ wrong? (it's not bonus coverage, at least - the start of the caller is certainly reached/that is where you're at) - and in many cases, the start of the function scope isn't some other line of code - it's the opening brace, so it'd be maybe a bit clear that we're not trying to describe a specific bit of code... It's not super principled, by any means... |
|
I suppose the major risk would be attaching a non-zero source location to an instruction that then traps and gets reported in a backtrace... but it's already a disaster if that happens I suppose. If the new source-location inserted was applied to the call, a stack-trace would also present the scope-line in a backtrace... which again is a bit more information than just line zero. I'll see what our debugger folks think. |
In functions that have been seriously deformed during optimisation, there can be call instructions with line-zero immediately after frame setup (see C reproducer in the test added). Our previous algorithms for prologue_end ignored these, meaning someone entering a function at prologue_end would break-in after a function call had completed. Prefer instead to not emit prologue_end at all: there is no good place to put it.
fa849a7 to
fe4f635
Compare
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
|
Several decades later, I've revised this to take @dwblaikie 's recommended approach. We now place prologue_end on the instruction before a zero-line-num call, and force it to have the scope line. This shouldn't override any prior source locations that do real computation (and can load/store to the heap) because those would have been picked for prologue_end instead. |
OCHyams
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several decades later, I've revised this to take @dwblaikie 's recommended approach.
This approach is different to what I read @dwblaikie 's suggestion as.
He said:
What about if we forced the location after the prologue, if it's line zero, to instead have the function's scope start location.
I think he was saying the call should get scope-line instruction? I could be misinterpreting this conversation though, it would be good to get confirmation.
I think I prefer this current approach, and it seems like @dwblaikie was ok to move forward previously, so this is a soft LGTM pending additional signal
| // have the scope line and put prologue_end there. This will be suboptimal, | ||
| // and might still be in setup code, but is less catastrophic than missing | ||
| // a call. | ||
| if (CurInst->isCall()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this covers tail calls too (which is good, just noting my thoughts for the review -- no action here)
| } | ||
|
|
||
| static std::pair<const MachineInstr *, bool> | ||
| findPrologueEndLoc(const MachineFunction *MF) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we take this opportunity to add a comment explaining the return pair. I think "true" meaning "empty prologue" for the second field isn't obvious
| # | ||
| # CHECK: 2 0 0 0 0 0 is_stmt | ||
| # CHECK-NEXT: 2 0 0 0 0 0 is_stmt prologue_end | ||
| # CHECK-NEXT: 0 0 0 0 0 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it could possibly be worth objdumping and cross referencing the addresses? I have done that in other similar tests. I worry that this could become rotten-green. If that fear seems unfounded to you, feel free to ignore this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've switched to examining the textual output, which has the upside of not printing hex, but still having the meaning of the .loc lines and their flags in the context of the instructions.
| entry: | ||
| #dbg_value(i32 %argc, !19, !DIExpression(), !21) | ||
| #dbg_value(ptr %argv, !20, !DIExpression(), !21) | ||
| tail call void (...) @ext(), !dbg !22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the variable location metadata (in the IR, metadata blob, or MIR)?
Yep, I think that's what I was saying.
Perhaps someone could summarize in a comment (with examples of assembly/instructions - what data we have, what we did before, what we're doing now) what we're doing/why we're doing it? |
|
Thanks for the catch on which instruction position, this was a very sophisticated off-by-one,
Illustrating the test example, in very rare circumstances call instructions can have line-zero, when they get branch-folded together for example. In contrived code such as the below: The calls to Where the first few instructions have empty debug-locs because they're all compiler generated, and the call has line zero because its source locations have been merged. The only "legitimate" line number in the function is then the return statement, which is where todays LLVM places prologue_end. This then means that a debugger honouring prologue_end as the correct place to put a function-breakpoint, will break into a function already having executed the function call. Which is catastrophic for interactive debugging. This patch is forcing the call instruction to have the function-scope line number, and putting prologue_end on that instruction too. It means we're bending the truth about the source location of the call instruction so that we can put the prologue_end somewhere meaningful. However, the alternative is no prologue_end at all, which is also poor. |
Oh, I think I'd misunderstood some more fundamental difference here - yeah, whether it's the start of the function line, or the start of the scope line, if the call happens to be within some narrower scope (like if the call was inlined from another function, that'd be especially relevant - whether it's part of the inline or not). Yeah, I guess scope sounds marginally better on that basis. Though we do produce file/line on DILexicalScopes and the subprgoram scopes of inlined instructions have that information to work with - we don't actually emit the file/line in a DW_TAG_lexical_scope (so we could consider dropping them from the metadata too). So maybe this distinction would only matter for inlined/subprogram scopes. |
|
Hmmm, so you're saying we could potentially use the scope-line of the line-zero call instructions DILocation? I suppose that would then produce a legitimate line number that's more localised to where the call came from, which is valuable. The only risk to my mind is that we might be making this decision long after the pc_hi/pc_low or scope ranges of any deeper scopes are computed, and those might skip over line zero locations. (We can test this). The downside would be that putting a breakpoint on a function would not stop in that function; you would always stop at a source-location that's inlined into that function. This doesn't sound too bad given that a) these are already highly deformed functions, and b) it can happen anyway with just normal inlining. I'll switch to inline-scope-line shortly and see what the output looks like. |
Yep, worth a test - I'd be mildly surprised if they skip line zero for scoping.
Presumably that already happens in the non-line-zero case (if you have f1 calls f2 calls f3, and f2 is inlined into f1, and you break on f1 you probably stop on the call to f3 which is in the inlined f2)
I don't feel /super/ strongly about this. |
Aha... I uh, don't either, and I reckon that what we have right now is sufficiently good. Without objection, I'll just land it as is (using the containing functions scope-line, rather than the inline scope). |
…lvm#156850) In functions that have been seriously deformed during optimisation, there can be call instructions with line-zero immediately after frame setup (see C reproducer in the test added). Our previous algorithms for prologue_end ignored these, meaning someone entering a function at prologue_end would break-in after a function call had completed. Prefer instead to place prologue_end and the function scope-line on the line zero call: this isn't false (it's the first meaningful instruction of the function) and is approximately true. Given a less than ideal function, this is an OK solution.
…lvm#156850) In functions that have been seriously deformed during optimisation, there can be call instructions with line-zero immediately after frame setup (see C reproducer in the test added). Our previous algorithms for prologue_end ignored these, meaning someone entering a function at prologue_end would break-in after a function call had completed. Prefer instead to place prologue_end and the function scope-line on the line zero call: this isn't false (it's the first meaningful instruction of the function) and is approximately true. Given a less than ideal function, this is an OK solution.
In functions that have been seriously deformed during optimisation, there can be call instructions with line-zero immediately after frame setup (see C reproducer in the test added). Our previous algorithms for prologue_end ignored these, meaning someone entering a function at prologue_end would break-in after a function call had completed. Prefer instead to not emit prologue_end at all: there is no good place to put it.