[BOLT][DWARF] DWO files size bloating when BOLT updates DWOs via DWP

Sorry for the interruption. We are currently trying to adopt BOLT for internal production use, so we may reach out for BOLT-related discussions more frequently in the near future...

This issue mainly aims to illustrate a case we encountered regarding the strategy BOLT uses when updating debuginfo based on DWP. Currently, it seems that the .debug_str.dwo section inside each DWO is directly copied from the DWP ([code](https://github.com/llvm/llvm-project/blob/main/bolt/lib/Rewrite/DWARFRewriter.cpp#L1772)), unlike other sections that are handled via "getSliceData / getOverridenSection". When there are many DWO files, this can cause significant bloat in their sizes, and re-generating the DWP using llvm-dwp also becomes much more time-consuming.

A case we met:
 - The project contains ~1000 source code files, and the size of final dwp file we get is 718MB (mainly containing **229MB of ".debug_str.dwo"** +  355MB of ".debug_info.dwo" + 134MB of others).
 - After BOLT updates the debuginfo via this DWP file, we end up with ~1000 ".dwo.dwo" files of almost the same size, each about 229 MB(**229MB of ".debug_str.dwo"** + several KB of ".debug_info.dwo" and others). In total, this adds up to over 200 GB(318 x) of additional disk space usage (1000 × 229 MB). 

I believe that if the project has more source files, the bloating could become even more severe. So can we consider the following options:
1. Ideally, can we directly use .debug_str_offsets to slice .debug_str per CU and then emit them accordingly? Is it technically feasible or there exists some blockers?
2. Another idea might be to use an in-memory llvm-dwp, though there seem to be implementation challenges — as far as I know, llvm-dwp currently doesn’t provide an in-memory serialization interface.
3. Perhaps we could try emitting .debug_str.dwo only in the first DWO, skipping the copies in the subsequent DWOs, and finally rely on llvm-dwp to merge them back into the DWP?

@ayermolo @rafaelauler @dwblaikie Do you have any comments on this? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[BOLT][DWARF] DWO files size bloating when BOLT updates DWOs via DWP #155766

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[BOLT][DWARF] DWO files size bloating when BOLT updates DWOs via DWP #155766

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions