Skip to content

[BOLT][DWARF] DWO files size bloating when BOLT updates DWOs via DWP #155766

@Jinjie-Huang

Description

@Jinjie-Huang

Sorry for the interruption. We are currently trying to adopt BOLT for internal production use, so we may reach out for BOLT-related discussions more frequently in the near future...

This issue mainly aims to illustrate a case we encountered regarding the strategy BOLT uses when updating debuginfo based on DWP. Currently, it seems that the .debug_str.dwo section inside each DWO is directly copied from the DWP (code), unlike other sections that are handled via "getSliceData / getOverridenSection". When there are many DWO files, this can cause significant bloat in their sizes, and re-generating the DWP using llvm-dwp also becomes much more time-consuming.

A case we met:

  • The project contains ~1000 source code files, and the size of final dwp file we get is 718MB (mainly containing 229MB of ".debug_str.dwo" + 355MB of ".debug_info.dwo" + 134MB of others).
  • After BOLT updates the debuginfo via this DWP file, we end up with ~1000 ".dwo.dwo" files of almost the same size, each about 229 MB(229MB of ".debug_str.dwo" + several KB of ".debug_info.dwo" and others). In total, this adds up to over 200 GB(318 x) of additional disk space usage (1000 × 229 MB).

I believe that if the project has more source files, the bloating could become even more severe. So can we consider the following options:

  1. Ideally, can we directly use .debug_str_offsets to slice .debug_str per CU and then emit them accordingly? Is it technically feasible or there exists some blockers?
  2. Another idea might be to use an in-memory llvm-dwp, though there seem to be implementation challenges — as far as I know, llvm-dwp currently doesn’t provide an in-memory serialization interface.
  3. Perhaps we could try emitting .debug_str.dwo only in the first DWO, skipping the copies in the subsequent DWOs, and finally rely on llvm-dwp to merge them back into the DWP?

@ayermolo @rafaelauler @dwblaikie Do you have any comments on this? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions