-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[BOLT][DWARF] Skip processing DWARF CUs with a DWO ID but no DWO name #154749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT][DWARF] Skip processing DWARF CUs with a DWO ID but no DWO name #154749
Conversation
@llvm/pr-subscribers-bolt Author: Jinjie Huang (Jinjie-Huang) ChangesDWOs with a DWOId of 0 are considered invalid. Currently, createRangeLocListAddressWriters() is called before these invalid DWOs are skipped, which seems unnecessary. Perhaps we can check the DWOId beforehand? BTW, this may lead to an assertion failure inside createRangeLocListAddressWriters("RangeLists writer for DWO unit already exists.), when a duplicate DWO ID of 0 is encountered. I hit this case when using the compiler built prior to this commit with -fsplit-dwarf-inlining enabled, which generates a large number of Compile Units with a DWO ID == 0. I'm not sure if other scenarios could also trigger this issue. Full diff: https://github.com/llvm/llvm-project/pull/154749.diff 1 Files Affected:
diff --git a/bolt/lib/Rewrite/DWARFRewriter.cpp b/bolt/lib/Rewrite/DWARFRewriter.cpp
index 0c1a1bac6c72e..e8f6189eb5a69 100644
--- a/bolt/lib/Rewrite/DWARFRewriter.cpp
+++ b/bolt/lib/Rewrite/DWARFRewriter.cpp
@@ -729,13 +729,13 @@ void DWARFRewriter::updateDebugInfo() {
ThreadPoolInterface &ThreadPool =
ParallelUtilities::getThreadPool(ThreadCount);
for (DWARFUnit *CU : DIEBlder.getProcessedCUs()) {
- createRangeLocListAddressWriters(*CU);
std::optional<DWARFUnit *> SplitCU;
std::optional<uint64_t> DWOId = CU->getDWOId();
if (DWOId)
SplitCU = BC.getDWOCU(*DWOId);
if (!SplitCU)
continue;
+ createRangeLocListAddressWriters(*CU);
DebugAddrWriter &AddressWriter =
*AddressWritersByCU[CU->getOffset()].get();
DebugRangesSectionWriter &TempRangesSectionWriter =
|
7e2e637
to
b6c3f3c
Compare
That sounds like a broken DWARF. Maybe better approach is to detect it and not to update debug information at all. |
bb17368
to
8851f2e
Compare
39145c2
to
1acbf66
Compare
✅ With the latest revision this PR passed the C/C++ code formatter. |
1acbf66
to
5a9460a
Compare
@ayermolo Thanks for the review. This kind of dwarf can reproduced via thinlto + split-dwarf + "-fsplit-dwarf-inlining" with clang prior to this commit.
The compiler generates DWARF for the cross-CU callee source file, and there can be multiple CUs with a dwoid of 0. This DWARF seems structurally normal, except that the dwoid is left at its initial value of '0'. I’m not sure whether this counts as a 'broken DWARF', or whether it can be effectively detected. The current implementation of this patch skips processing duplicates with a dwoid of 0, is this reasonable? |
Before BOLT it has multiple DWO IDs that are the same, zero. Which should make llvm-dwp abort. After BOLT it will definitely will be broken because those dwo cus will be referring to the structure of the binary before BOLT. If we do want to support partial update maybe a better way is not to include those CUs in final output? |
Yes, in llvm-dwp for a DW_TAG_skeleton_unit having a DW_AT_dwo_name DIE will trigger a "duplicate DWO ID" error if duplicates exist. Are you suggesting that we can determine whether to process this CU by checking for the existence of DW_AT_dwo_name in the DIE, instead of checking if the dwoid is 0?
I might be mistaken, the current changes won’t generate DWOs for these CUs, and the CUs in these binaries should also be discarded, right? Since the latest llvm has already fixed this issue with DW_TAG_skeleton_units that lack DW_AT_dwo_name, so we probably don’t need to specifically handle this case? |
I guess your case combines two issues: 1) duplicate DWO ID and 2) malformed DWARF (no DWO_names). Fundamental question, I think, remains. Should BOLT just reject this kind of DWARF and not update at all and remove debug -sections, or try to "fix it". I can see arguments in either direction. If we go with "fix it" then: |
5a9460a
to
a46877b
Compare
|
||
# CHECK: BOLT-ERROR: broken DWARF found in CU at offset 0x47 (DWOId=0x0, missing DW_AT_dwo_name / DW_AT_GNU_dwo_name). | ||
# CHECK: BOLT-ERROR: broken DWARF found in CU at offset 0x70 (DWOId=0x0, missing DW_AT_dwo_name / DW_AT_GNU_dwo_name). | ||
# CHECK-NOT: Assertion `RangeListsWritersByCU.count(*DWOId) == 0 && "RangeLists writer for DWO unit already exists."' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you separate DWO_name missing and DWO ID is the same into two different diffs?
The latter can happen without the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ayermolo Test case has been modified to only check the detection of this broken DWARF that is missing a DWO_name.
For such broken CUs, should our strategy be to only detect it, or should we also skip its processing and exclude it from the post-BOLT binary? The latter one also happens to solve the "DWO ID is the same 0" issue I've been facing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. You can have duplicate DWO ID (which is zero, or some other value) with valid DWO_name.
If you want to handle that, I would say best way would be in another PR that specifically handles that case.
So
case1) DWO_name missing
case2) DWO ID duplicates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this acceptable?
Case 1 (DWO_name missing): Exclude such broken DW_TAG_compile_unit/DW_TAG_skeleton_unit from the post-BOLT binary that is missing the DW_AT_dwo_name attribute.
Case 2 (Duplicate DWO IDs): Handle only the first dwo CU , and ensure BOLT doesn't crash.
9fbeab6
to
67d941e
Compare
Looks almost there. Left some minor nits. |
Got it, really appreciate you taking the time to review this on your last day. Wishing you all the best for what's next! :) |
5ea623a
to
744c824
Compare
744c824
to
f6367f8
Compare
@grigorypas @maksfb PTAL, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paschalis-mpeis Hmm, the test case has already been reduced ( unnecessary parts of the .text section are removed), leaving only debug-info-related content in the assembly. Should I consider reducing the debug info as well? |
Hey @Jinjie-Huang, If you can trim some debug info without loosing coverage, please do. From a quick scan, other debug tests are typically ~200-few hundred lines, and this one is in that range. Not a specialist in DWARF. Keep in mind that there may be a delay in feedback as, TMU, there's a transition in reviewing this area. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. Please check a couple of minor comments inlined.
5daa7ab
to
10bb124
Compare
Thanks everyone for the review and the approval. |
…DWO name (#154749)" This reverts commit e6540d2. It breaks AArch64 Buildbot builders, see: https://lab.llvm.org/staging/#/builders/219/builds/5476
Hey @Jinjie-Huang, Unfortunately this patch has broken some of the AArch64 buildbot builders, so I suggest we revert it for now: |
@paschalis-mpeis Sorry for breaking the builders. The error seems to be related to the machine target incompatibility. It’s likely because I didn’t add the appropriate %cflags in the test to control the triple. May I fix this patch accordingly? |
Fix the test in dwarf5-dwoid-no-dwoname.s, add %cflags in the test to control the triple. [detail](#154749 (comment))
Fix the test in dwarf5-dwoid-no-dwoname.s, add %cflags in the test to control the triple. [detail](llvm/llvm-project#154749 (comment))
Fix the test in dwarf5-dwoid-no-dwoname.s, add %cflags in the test to control the triple. [detail](llvm#154749 (comment))
Perhaps we can skip processing those CUs with DWOId but lacking a dwo_name, since they can be considered invalid?
Btw in the case i met, the dwo id of these invalid CUs are all zero, and multiple DWOIds of 0 may lead to an assertion failure inside createRangeLocListAddressWriters("RangeLists writer for DWO unit already exists"). I hit this case when using the compiler built prior to this commit with -fsplit-dwarf-inlining enabled, which generates a large number of Compile Units with a DWO ID == 0. I'm not sure if there are other cases could also trigger this issue.
This patch tries to skip processing DWARF CUs with a DWO ID but no DWO name, and ensure them not included in the final binary.