Skip to content

Conversation

Jinjie-Huang
Copy link
Contributor

@Jinjie-Huang Jinjie-Huang commented Aug 21, 2025

Perhaps we can skip processing those CUs with DWOId but lacking a dwo_name, since they can be considered invalid?

Btw in the case i met, the dwo id of these invalid CUs are all zero, and multiple DWOIds of 0 may lead to an assertion failure inside createRangeLocListAddressWriters("RangeLists writer for DWO unit already exists"). I hit this case when using the compiler built prior to this commit with -fsplit-dwarf-inlining enabled, which generates a large number of Compile Units with a DWO ID == 0. I'm not sure if there are other cases could also trigger this issue.

This patch tries to skip processing DWARF CUs with a DWO ID but no DWO name, and ensure them not included in the final binary.

@llvmbot
Copy link
Member

llvmbot commented Aug 21, 2025

@llvm/pr-subscribers-bolt

Author: Jinjie Huang (Jinjie-Huang)

Changes

DWOs with a DWOId of 0 are considered invalid. Currently, createRangeLocListAddressWriters() is called before these invalid DWOs are skipped, which seems unnecessary. Perhaps we can check the DWOId beforehand?

BTW, this may lead to an assertion failure inside createRangeLocListAddressWriters("RangeLists writer for DWO unit already exists.), when a duplicate DWO ID of 0 is encountered. I hit this case when using the compiler built prior to this commit with -fsplit-dwarf-inlining enabled, which generates a large number of Compile Units with a DWO ID == 0. I'm not sure if other scenarios could also trigger this issue.


Full diff: https://github.com/llvm/llvm-project/pull/154749.diff

1 Files Affected:

  • (modified) bolt/lib/Rewrite/DWARFRewriter.cpp (+1-1)
diff --git a/bolt/lib/Rewrite/DWARFRewriter.cpp b/bolt/lib/Rewrite/DWARFRewriter.cpp
index 0c1a1bac6c72e..e8f6189eb5a69 100644
--- a/bolt/lib/Rewrite/DWARFRewriter.cpp
+++ b/bolt/lib/Rewrite/DWARFRewriter.cpp
@@ -729,13 +729,13 @@ void DWARFRewriter::updateDebugInfo() {
     ThreadPoolInterface &ThreadPool =
         ParallelUtilities::getThreadPool(ThreadCount);
     for (DWARFUnit *CU : DIEBlder.getProcessedCUs()) {
-      createRangeLocListAddressWriters(*CU);
       std::optional<DWARFUnit *> SplitCU;
       std::optional<uint64_t> DWOId = CU->getDWOId();
       if (DWOId)
         SplitCU = BC.getDWOCU(*DWOId);
       if (!SplitCU)
         continue;
+      createRangeLocListAddressWriters(*CU);
       DebugAddrWriter &AddressWriter =
           *AddressWritersByCU[CU->getOffset()].get();
       DebugRangesSectionWriter &TempRangesSectionWriter =

@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch 2 times, most recently from 7e2e637 to b6c3f3c Compare August 21, 2025 13:29
@Jinjie-Huang Jinjie-Huang changed the title [BOLT][DWARF] Avoid unnecessary work if DWO id is zero [BOLT][DWARF] Avoid work if DWO id is zero Aug 21, 2025
@Jinjie-Huang Jinjie-Huang changed the title [BOLT][DWARF] Avoid work if DWO id is zero [BOLT][DWARF] Avoid invalid work if DWO id is zero Aug 27, 2025
@ayermolo ayermolo self-assigned this Aug 27, 2025
@ayermolo
Copy link
Contributor

That sounds like a broken DWARF. Maybe better approach is to detect it and not to update debug information at all.

@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch from bb17368 to 8851f2e Compare September 2, 2025 09:18
@Jinjie-Huang Jinjie-Huang changed the title [BOLT][DWARF] Avoid invalid work if DWO id is zero [BOLT][DWARF] Skip processing DWO files with ID 0 Sep 2, 2025
@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch from 39145c2 to 1acbf66 Compare September 2, 2025 10:09
@github-actions
Copy link

github-actions bot commented Sep 2, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch from 1acbf66 to 5a9460a Compare September 2, 2025 10:18
@Jinjie-Huang
Copy link
Contributor Author

Jinjie-Huang commented Sep 2, 2025

@ayermolo Thanks for the review.

This kind of dwarf can reproduced via thinlto + split-dwarf + "-fsplit-dwarf-inlining" with clang prior to this commit.
And we can get dwarf like this(btw, a test has also been added):

0x000000d7: Compile Unit: length = 0x00000025, format = DWARF32, version = 0x0005, unit_type = DW_UT_skeleton, abbr_offset = 0x005b, addr_size = 0x08, DWO_id = 0x0000000000000000 (next unit at 0x00000100)

0x000000eb: DW_TAG_skeleton_unit
              DW_AT_stmt_list [DW_FORM_sec_offset]	(0x000001f3)
              DW_AT_str_offsets_base [DW_FORM_sec_offset]	(0x00000008)
              DW_AT_comp_dir [DW_FORM_strx1]	(".")
              DW_AT_GNU_pubnames [DW_FORM_flag_present]	(true)
              DW_AT_producer [DW_FORM_strx1]	("clang version 16.0.6)")
              DW_AT_language [DW_FORM_data2]	(DW_LANG_C_plus_plus_14)
              DW_AT_name [DW_FORM_strx1]	("callee.cpp")
              DW_AT_addr_base [DW_FORM_sec_offset]	(0x00000008)

0x000000fd:   DW_TAG_subprogram
                DW_AT_name [DW_FORM_strx1]	("hotFunction")
                DW_AT_inline [DW_FORM_implicit_const]	(DW_INL_inlined)

The compiler generates DWARF for the cross-CU callee source file, and there can be multiple CUs with a dwoid of 0. This DWARF seems structurally normal, except that the dwoid is left at its initial value of '0'. I’m not sure whether this counts as a 'broken DWARF', or whether it can be effectively detected.

The current implementation of this patch skips processing duplicates with a dwoid of 0, is this reasonable?

@ayermolo
Copy link
Contributor

ayermolo commented Sep 2, 2025

Before BOLT it has multiple DWO IDs that are the same, zero. Which should make llvm-dwp abort.
Although if there are still .o/.dwo files it should be debuggable I believe. Not sure how lldb uses DWO ID internally.

After BOLT it will definitely will be broken because those dwo cus will be referring to the structure of the binary before BOLT.

If we do want to support partial update maybe a better way is not to include those CUs in final output?

@Jinjie-Huang
Copy link
Contributor Author

Jinjie-Huang commented Sep 4, 2025

Before BOLT it has multiple DWO IDs that are the same, zero. Which should make llvm-dwp abort.

Yes, in llvm-dwp for a DW_TAG_skeleton_unit having a DW_AT_dwo_name DIE will trigger a "duplicate DWO ID" error if duplicates exist.
However, in the case I met(as shown above), this DIE does not have a DW_AT_dwo_name, and the actual dwo file doesn't exist either, since this DIE was generated for a .cpp file (which seems not a expected behavior, with DW_AT_name=xxx.cpp). In this situation, since there is no duplicate dwo file, the dwp error is not triggered. In llvm-bolt, on the other hand, it will try to search for the "DW_AT_comp_dir" and give a " DWO debug information for was not retrieved..." warning and then crash because of the multiple ids of 0.

Are you suggesting that we can determine whether to process this CU by checking for the existence of DW_AT_dwo_name in the DIE, instead of checking if the dwoid is 0?

If we do want to support partial update maybe a better way is not to include those CUs in final output?

I might be mistaken, the current changes won’t generate DWOs for these CUs, and the CUs in these binaries should also be discarded, right?

Since the latest llvm has already fixed this issue with DW_TAG_skeleton_units that lack DW_AT_dwo_name, so we probably don’t need to specifically handle this case?

@ayermolo
Copy link
Contributor

ayermolo commented Sep 6, 2025

I guess your case combines two issues: 1) duplicate DWO ID and 2) malformed DWARF (no DWO_names).

Fundamental question, I think, remains. Should BOLT just reject this kind of DWARF and not update at all and remove debug -sections, or try to "fix it". I can see arguments in either direction.

If we go with "fix it" then:
If you want to handle the duplicate DWO IDs and still have some debug information then excluding Skelton CUs from final output and print a BOLT DWARF ERROR. This way users will have some debug information and it will be valid. Instead of pointing to incorrect DWARF in dwo file.
For missing dwo_name making sure that BOLT doesn't crash, and printing BOLT DWARF error, is probably good enough.

@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch from 5a9460a to a46877b Compare September 8, 2025 13:43

# CHECK: BOLT-ERROR: broken DWARF found in CU at offset 0x47 (DWOId=0x0, missing DW_AT_dwo_name / DW_AT_GNU_dwo_name).
# CHECK: BOLT-ERROR: broken DWARF found in CU at offset 0x70 (DWOId=0x0, missing DW_AT_dwo_name / DW_AT_GNU_dwo_name).
# CHECK-NOT: Assertion `RangeListsWritersByCU.count(*DWOId) == 0 && "RangeLists writer for DWO unit already exists."'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you separate DWO_name missing and DWO ID is the same into two different diffs?
The latter can happen without the first.

Copy link
Contributor Author

@Jinjie-Huang Jinjie-Huang Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ayermolo Test case has been modified to only check the detection of this broken DWARF that is missing a DWO_name.

For such broken CUs, should our strategy be to only detect it, or should we also skip its processing and exclude it from the post-BOLT binary? The latter one also happens to solve the "DWO ID is the same 0" issue I've been facing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. You can have duplicate DWO ID (which is zero, or some other value) with valid DWO_name.
If you want to handle that, I would say best way would be in another PR that specifically handles that case.

So
case1) DWO_name missing
case2) DWO ID duplicates.

Copy link
Contributor Author

@Jinjie-Huang Jinjie-Huang Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this acceptable?

Case 1 (DWO_name missing): Exclude such broken DW_TAG_compile_unit/DW_TAG_skeleton_unit from the post-BOLT binary that is missing the DW_AT_dwo_name attribute.
Case 2 (Duplicate DWO IDs): Handle only the first dwo CU , and ensure BOLT doesn't crash.

@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch 2 times, most recently from 9fbeab6 to 67d941e Compare September 9, 2025 11:01
@ayermolo
Copy link
Contributor

Looks almost there. Left some minor nits.
Today is my last day at a current company, hopefully will be able to re-setup BOLT environment soon to continue other reviews.
Please work with @grigorypas and @maksfb to finish this.

@Jinjie-Huang
Copy link
Contributor Author

Got it, really appreciate you taking the time to review this on your last day. Wishing you all the best for what's next! :)

@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch from 5ea623a to 744c824 Compare September 15, 2025 12:12
@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch from 744c824 to f6367f8 Compare September 15, 2025 12:13
@Jinjie-Huang
Copy link
Contributor Author

@grigorypas @maksfb PTAL, thanks!

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks reasonable and it seems @ayermolo comments were mostly addressed.
Could you also minimize the test? Similarly for the other PR (see comment here).

@Jinjie-Huang
Copy link
Contributor Author

Could you also minimize the test? Similarly for the other PR (see comment here).

@paschalis-mpeis Hmm, the test case has already been reduced ( unnecessary parts of the .text section are removed), leaving only debug-info-related content in the assembly. Should I consider reducing the debug info as well?

@paschalis-mpeis
Copy link
Member

Hey @Jinjie-Huang,

If you can trim some debug info without loosing coverage, please do. From a quick scan, other debug tests are typically ~200-few hundred lines, and this one is in that range.

Not a specialist in DWARF. Keep in mind that there may be a delay in feedback as, TMU, there's a transition in reviewing this area.

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. Please check a couple of minor comments inlined.

@Jinjie-Huang Jinjie-Huang force-pushed the users/huangjinjie/bolt_dwoid_zero branch from 5daa7ab to 10bb124 Compare September 23, 2025 03:43
@maksfb maksfb changed the title [BOLT][DWARF] Skip processing DWARF CUs with a DWO ID but no DWO name. [BOLT][DWARF] Skip processing DWARF CUs with a DWO ID but no DWO name Sep 23, 2025
@Jinjie-Huang
Copy link
Contributor Author

Thanks everyone for the review and the approval.

@Jinjie-Huang Jinjie-Huang merged commit e6540d2 into llvm:main Sep 23, 2025
9 checks passed
paschalis-mpeis added a commit that referenced this pull request Sep 25, 2025
@paschalis-mpeis
Copy link
Member

Hey @Jinjie-Huang,

Unfortunately this patch has broken some of the AArch64 buildbot builders, so I suggest we revert it for now:

Please have a look at: #5476, all builders are here

@Jinjie-Huang
Copy link
Contributor Author

@paschalis-mpeis Sorry for breaking the builders. The error seems to be related to the machine target incompatibility. It’s likely because I didn’t add the appropriate %cflags in the test to control the triple. May I fix this patch accordingly?

Jinjie-Huang added a commit that referenced this pull request Sep 25, 2025
Fix the test in dwarf5-dwoid-no-dwoname.s, add %cflags in the test to
control the triple.
[detail](#154749 (comment))
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 25, 2025
Fix the test in dwarf5-dwoid-no-dwoname.s, add %cflags in the test to
control the triple.
[detail](llvm/llvm-project#154749 (comment))
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
Fix the test in dwarf5-dwoid-no-dwoname.s, add %cflags in the test to
control the triple.
[detail](llvm#154749 (comment))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants