-
Couldn't load subscription status.
- Fork 15k
Description
Sorry for the interruption. We are currently trying to adopt BOLT for internal production use, so we may reach out for BOLT-related discussions more frequently in the near future...
This issue mainly aims to illustrate a case we encountered regarding the strategy BOLT uses when updating debuginfo based on DWP. Currently, it seems that the .debug_str.dwo section inside each DWO is directly copied from the DWP (code), unlike other sections that are handled via "getSliceData / getOverridenSection". When there are many DWO files, this can cause significant bloat in their sizes, and re-generating the DWP using llvm-dwp also becomes much more time-consuming.
A case we met:
- The project contains ~1000 source code files, and the size of final dwp file we get is 718MB (mainly containing 229MB of ".debug_str.dwo" + 355MB of ".debug_info.dwo" + 134MB of others).
- After BOLT updates the debuginfo via this DWP file, we end up with ~1000 ".dwo.dwo" files of almost the same size, each about 229 MB(229MB of ".debug_str.dwo" + several KB of ".debug_info.dwo" and others). In total, this adds up to over 200 GB(318 x) of additional disk space usage (1000 × 229 MB).
I believe that if the project has more source files, the bloating could become even more severe. So can we consider the following options:
- Ideally, can we directly use .debug_str_offsets to slice .debug_str per CU and then emit them accordingly? Is it technically feasible or there exists some blockers?
- Another idea might be to use an in-memory llvm-dwp, though there seem to be implementation challenges — as far as I know, llvm-dwp currently doesn’t provide an in-memory serialization interface.
- Perhaps we could try emitting .debug_str.dwo only in the first DWO, skipping the copies in the subsequent DWOs, and finally rely on llvm-dwp to merge them back into the DWP?
@ayermolo @rafaelauler @dwblaikie Do you have any comments on this? Thank you.