Skip to content

Conversation

mpark
Copy link
Member

@mpark mpark commented Sep 9, 2025

This PR adds a --exec-dwo-path-remapping-file flag to llvm-dwp which allows llvm-dwp to remap the DWO paths embedded in executables and libraries. The debug information within the final executable contains references to the original paths of the .dwo files. If you move or rename these .dwo files during the build process, the references in the final binary will be broken. This remapping option allows llvm-dwp to remap the original paths in the executable to the new paths.

A new test llvm/test/tools/llvm-dwp/X86/dwos_list_from_exec_remap.test has been added to demonstrate and test the functionality.

@llvmbot
Copy link
Member

llvmbot commented Sep 9, 2025

@llvm/pr-subscribers-debuginfo

Author: Michael Park (mpark)

Changes

This PR adds a --exec-dwo-path-remapping-file flag to llvm-dwp which allows llvm-dwp to remap the DWO paths embedded in executables and libraries. The debug information within the final executable contains references to the original paths of the .dwo files. If you move or rename these .dwo files during the build process, the references in the final binary will be broken. This remapping option allows llvm-dwp to remap the original paths in the executable to the new paths.


Full diff: https://github.com/llvm/llvm-project/pull/157587.diff

3 Files Affected:

  • (added) llvm/test/tools/llvm-dwp/X86/dwos_list_from_exec_remap.test (+116)
  • (modified) llvm/tools/llvm-dwp/Opts.td (+2)
  • (modified) llvm/tools/llvm-dwp/llvm-dwp.cpp (+43-2)
diff --git a/llvm/test/tools/llvm-dwp/X86/dwos_list_from_exec_remap.test b/llvm/test/tools/llvm-dwp/X86/dwos_list_from_exec_remap.test
new file mode 100644
index 0000000000000..254f353ab715f
--- /dev/null
+++ b/llvm/test/tools/llvm-dwp/X86/dwos_list_from_exec_remap.test
@@ -0,0 +1,116 @@
+RUN: rm -rf %t
+RUN: mkdir %t
+RUN: cd %t
+
+RUN: cp %p/../Inputs/dwos_list_from_exec/main main
+RUN: cp %p/../Inputs/dwos_list_from_exec/libd.so libd.so
+
+RUN: touch remapping.txt
+RUN: echo "./a.dwo %p/../Inputs/dwos_list_from_exec/a.dwo" >> remapping.txt
+RUN: echo "./b.dwo %p/../Inputs/dwos_list_from_exec/b.dwo" >> remapping.txt
+RUN: echo "./d.dwo %p/../Inputs/dwos_list_from_exec/d.dwo" >> remapping.txt
+RUN: llvm-dwp %p/../Inputs/dwos_list_from_exec/c.dwo %p/../Inputs/dwos_list_from_exec/e.dwo \
+RUN:   -e main -e libd.so --exec-dwo-path-remapping-file=remapping.txt -o - | llvm-dwarfdump -v - | FileCheck %s
+
+RUN: touch remapping_main.txt
+RUN: echo "./a.dwo %p/../Inputs/dwos_list_from_exec/a.dwo" >> remapping_main.txt
+RUN: echo "./b.dwo %p/../Inputs/dwos_list_from_exec/b.dwo" >> remapping_main.txt
+RUN: echo "./d.dwo %p/../Inputs/dwos_list_from_exec/d.dwo" > remapping_libd.txt
+RUN: llvm-dwp %p/../Inputs/dwos_list_from_exec/c.dwo %p/../Inputs/dwos_list_from_exec/e.dwo \
+RUN:   -e main -e libd.so \
+RUN:   --exec-dwo-path-remapping-file=remapping_main.txt \
+RUN:   --exec-dwo-path-remapping-file=remapping_libd.txt \
+RUN:   -o - | llvm-dwarfdump -v - | FileCheck %s
+
+RUN: mkdir foo && cp %p/../Inputs/dwos_list_from_exec/a.dwo foo/a.dwo
+RUN: mkdir bar && cp %p/../Inputs/dwos_list_from_exec/b.dwo bar/b.dwo
+RUN: mkdir qux && cp %p/../Inputs/dwos_list_from_exec/d.dwo qux/d.dwo
+RUN: touch remapping_relative.txt
+RUN: echo "a.dwo foo/a.dwo" >> remapping_relative.txt
+RUN: echo "b.dwo bar/b.dwo" >> remapping_relative.txt
+RUN: echo "d.dwo qux/d.dwo" >> remapping_relative.txt
+RUN: llvm-dwp %p/../Inputs/dwos_list_from_exec/c.dwo %p/../Inputs/dwos_list_from_exec/e.dwo \
+RUN:   -e main -e libd.so --exec-dwo-path-remapping-file=remapping_relative.txt -o - | llvm-dwarfdump -v - | FileCheck %s
+
+Build commands for the test binaries:
+
+clang++ -Xclang -fdebug-compilation-dir -Xclang "./" -g -O0 -gsplit-dwarf a.cpp b.cpp -o main
+clang++ -g -O0 -gsplit-dwarf -c c.cpp -o c.o
+clang++ -Xclang -fdebug-compilation-dir -Xclang "./" -g -O0 -gsplit-dwarf -fPIC -shared d.cpp -o libd.so
+clang++ -g -O0 -gsplit-dwarf -c e.cpp -o e.o
+
+sources:
+a.cpp:
+  void a() {}
+
+b.cpp:
+  void b() {}
+  int main() {
+     return 0;
+  }
+
+c.cpp:
+  void c() {}
+
+d.cpp:
+  void d() {}
+
+e.cpp:
+  void e() {}
+
+CHECK-LABEL: .debug_abbrev.dwo contents:
+
+CHECK-LABEL: Abbrev table for offset:
+CHECK: DW_TAG_compile_unit
+CHECK: DW_TAG_subprogram
+
+CHECK-LABEL: Abbrev table for offset:
+CHECK: DW_TAG_compile_unit
+CHECK: DW_TAG_subprogram
+
+CHECK-LABEL: Abbrev table for offset:
+CHECK: DW_TAG_compile_unit
+CHECK: DW_TAG_subprogram
+
+CHECK-LABEL: Abbrev table for offset:
+CHECK: DW_TAG_compile_unit
+CHECK: DW_TAG_subprogram
+
+CHECK-LABEL: Abbrev table for offset:
+CHECK: DW_TAG_compile_unit
+CHECK: DW_TAG_subprogram
+
+CHECK: .debug_info.dwo contents:
+CHECK: [[AOFF:0x[0-9a-f]*]]:
+
+CHECK-LABEL: Compile Unit: length = {{.*}}, version = 0x0004
+CHECK: DW_TAG_compile_unit
+CHECK:   DW_AT_name {{.*}} "c.cpp"
+CHECK:   DW_TAG_subprogram
+CHECK:     DW_AT_name {{.*}} "c"
+
+CHECK-LABEL: Compile Unit: length = {{.*}}, version = 0x0004
+CHECK: DW_TAG_compile_unit
+CHECK:   DW_AT_name {{.*}} "e.cpp"
+CHECK:   DW_TAG_subprogram
+CHECK:     DW_AT_name {{.*}} "e"
+
+CHECK-LABEL: Compile Unit: length = {{.*}}, version = 0x0004
+CHECK: DW_TAG_compile_unit
+CHECK:   DW_AT_name {{.*}} "a.cpp"
+CHECK:   DW_TAG_subprogram
+CHECK:     DW_AT_name {{.*}} "a"
+
+CHECK-LABEL: Compile Unit: length = {{.*}}, version = 0x0004
+CHECK: DW_TAG_compile_unit
+CHECK:   DW_AT_name {{.*}} "b.cpp"
+CHECK:   DW_TAG_subprogram
+CHECK:     DW_AT_name {{.*}} "b"
+CHECK:   DW_TAG_subprogram
+CHECK:     DW_AT_name {{.*}} "main"
+
+CHECK-LABEL: Compile Unit: length = {{.*}}, version = 0x0004
+CHECK: DW_TAG_compile_unit
+CHECK:   DW_AT_name {{.*}} "d.cpp"
+CHECK:   DW_TAG_subprogram
+CHECK:     DW_AT_name {{.*}} "d"
diff --git a/llvm/tools/llvm-dwp/Opts.td b/llvm/tools/llvm-dwp/Opts.td
index 46593bc40ebae..ecc1c25814c50 100644
--- a/llvm/tools/llvm-dwp/Opts.td
+++ b/llvm/tools/llvm-dwp/Opts.td
@@ -16,3 +16,5 @@ def continueOnCuIndexOverflow_EQ : Joined<["-", "--"], "continue-on-cu-index-ove
     "\t\ttruncated but valid DWP file, discarding any DWO files that would not fit within \n"
     "\t\tthe 32 bit/4GB limits of the format.">,
   Values<"continue,soft-stop">;
+def execDwoPathRemappingFile : Joined<["--"], "exec-dwo-path-remapping-file=">, MetaVarName<"<filename>">,
+  HelpText<"Use the old and new paths described in <filename> to map old dwo paths in executables/libraries to the new paths.">;
diff --git a/llvm/tools/llvm-dwp/llvm-dwp.cpp b/llvm/tools/llvm-dwp/llvm-dwp.cpp
index 61ba82d0634ac..04db3317f163f 100644
--- a/llvm/tools/llvm-dwp/llvm-dwp.cpp
+++ b/llvm/tools/llvm-dwp/llvm-dwp.cpp
@@ -10,6 +10,7 @@
 // package files).
 //
 //===----------------------------------------------------------------------===//
+#include "llvm/ADT/StringMap.h"
 #include "llvm/DWP/DWP.h"
 #include "llvm/DWP/DWPError.h"
 #include "llvm/DWP/DWPStringPool.h"
@@ -75,7 +76,7 @@ static std::string OutputFilename;
 static std::string ContinueOption;
 
 static Expected<SmallVector<std::string, 16>>
-getDWOFilenames(StringRef ExecFilename) {
+getDWOFilenames(StringRef ExecFilename, const StringMap<StringRef> &DWOPathMap) {
   auto ErrOrObj = object::ObjectFile::createObjectFile(ExecFilename);
   if (!ErrOrObj)
     return ErrOrObj.takeError();
@@ -95,11 +96,17 @@ getDWOFilenames(StringRef ExecFilename) {
     if (!DWOCompDir.empty()) {
       SmallString<16> DWOPath(DWOName);
       sys::fs::make_absolute(DWOCompDir, DWOPath);
+      if (auto I = DWOPathMap.find(DWOPath); I != DWOPathMap.end())
+        DWOPath = I->getValue();
+      if (auto I = DWOPathMap.find(DWOName); I != DWOPathMap.end())
+        DWOName = I->getValue();
       if (!sys::fs::exists(DWOPath) && sys::fs::exists(DWOName))
         DWOPaths.push_back(std::move(DWOName));
       else
         DWOPaths.emplace_back(DWOPath.data(), DWOPath.size());
     } else {
+      if (auto I = DWOPathMap.find(DWOName); I != DWOPathMap.end())
+        DWOName = I->getValue();
       DWOPaths.push_back(std::move(DWOName));
     }
   }
@@ -120,6 +127,32 @@ static Expected<Triple> readTargetTriple(StringRef FileName) {
   return ErrOrObj->getBinary()->makeTriple();
 }
 
+static Error addPathsToRemapFromFile(StringMap<StringRef> &DWOPathMap,
+                                     BumpPtrAllocator &Alloc,
+                                     StringRef Filename) {
+  StringSaver Saver(Alloc);
+  SmallVector<StringRef, 16> Lines;
+  auto BufOrErr = MemoryBuffer::getFile(Filename);
+  if (!BufOrErr)
+    return createFileError(Filename, BufOrErr.getError());
+
+  BufOrErr.get()->getBuffer().split(Lines, '\n');
+  for (size_t LineNo = 0, NumLines = Lines.size(); LineNo < NumLines; ++LineNo) {
+    StringRef TrimmedLine = Lines[LineNo].split('#').first.trim();
+    if (TrimmedLine.empty())
+      continue;
+
+    std::pair<StringRef, StringRef> Pair = Saver.save(TrimmedLine).split(' ');
+    StringRef NewName = Pair.second.trim();
+    if (NewName.empty())
+      return createStringError(errc::invalid_argument,
+                               "%s:%zu: missing new DWO path",
+                               Filename.str().c_str(), LineNo + 1);
+    DWOPathMap.insert({Pair.first, NewName});
+  }
+  return Error::success();
+}
+
 int llvm_dwp_main(int argc, char **argv, const llvm::ToolContext &) {
   DwpOptTable Tbl;
   llvm::BumpPtrAllocator A;
@@ -173,8 +206,16 @@ int llvm_dwp_main(int argc, char **argv, const llvm::ToolContext &) {
   llvm::InitializeAllTargets();
   llvm::InitializeAllAsmPrinters();
 
+  llvm::StringMap<StringRef> DWOPathMap;
+  for (auto *Arg : Args.filtered(OPT_execDwoPathRemappingFile)) {
+    if (Error E = addPathsToRemapFromFile(DWOPathMap, A, Arg->getValue())) {
+      logAllUnhandledErrors(std::move(E), WithColor::error());
+      return 1;
+    }
+  }
+
   for (const auto &ExecFilename : ExecFilenames) {
-    auto DWOs = getDWOFilenames(ExecFilename);
+    auto DWOs = getDWOFilenames(ExecFilename, DWOPathMap);
     if (!DWOs) {
       logAllUnhandledErrors(
           handleErrors(DWOs.takeError(),

@mpark mpark force-pushed the llvm-dwp-remapping branch from 88d334b to 7bd9051 Compare September 9, 2025 01:24
@llvm llvm deleted a comment from github-actions bot Sep 9, 2025
@mpark mpark requested review from ayermolo and dwblaikie September 9, 2025 01:24
@dwblaikie
Copy link
Collaborator

Having to read a file seems a bit unfortunate - could we do something that matches existing debug info features like -fdebug-prefix-map?

@ayermolo
Copy link
Contributor

ayermolo commented Sep 9, 2025

Michael can you add some more context about how this is an artifact of the build system feature, etc.
My understanding from your internal post, that each path can be different?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you specify dwarf version explicitly, in case default changes in the future.
Kind of surprised it defaults to 4. I thought 5 was the default now.

Copy link
Member Author

@mpark mpark Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm basically willing to do whatever here, but I'm not sure just adding -gdwarf-5 here makes sense. This test is largely copied from the existing dwos_list_from_exec_simple.test, which has these binaries checked into the repo under Inputs/dwos_list_from_exec. The binaries were definitely built to DWARF-4 spec. I could specify -gdwarf-5 in both and update the checked in binaries or, alternatively I could specify -gdwarf-4 in these tests to pin the version. Do you have a preferred approach?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I missed this operates on binary level. I thought we converted all tests to be text version. Never mind.

@mpark
Copy link
Member Author

mpark commented Sep 9, 2025

Having to read a file seems a bit unfortunate - could we do something that matches existing debug info features like -fdebug-prefix-map?

Hi @dwblaikie, yes, good question. For our use case, the remapping is not from one directory to another but rather on a per-file basis, which is why this takes a file of mappings per file. This has to do with content-based hashes in output paths in Buck2. Specifically, under such a model the output path of an artifact is expected to include the hash of its contents, which of course can be different for every file within a directory.

It sounds like Bazel has this feature as well, maybe? Based on Experimental Content-Based Output Paths. Do you know how this situation is handled there?

… that allows remapping the dwo paths embedded in executables and libraries.
@mpark mpark force-pushed the llvm-dwp-remapping branch from 7bd9051 to da34ce4 Compare September 10, 2025 07:17
@dwblaikie
Copy link
Collaborator

Having to read a file seems a bit unfortunate - could we do something that matches existing debug info features like -fdebug-prefix-map?

Hi @dwblaikie, yes, good question. For our use case, the remapping is not from one directory to another but rather on a per-file basis, which is why this takes a file of mappings per file. This has to do with content-based hashes in output paths in Buck2. Specifically, under such a model the output path of an artifact is expected to include the hash of its contents, which of course can be different for every file within a directory.

OK, thanks for the context.

Hmm - one thing we don't do at Google/my use case - is build a DWP from an executable. We build a DWP from the same list of object files that were sent to the linker. (this means the DWP can be generated at the same time as the linked executable, rather than waiting for one to do the other - it does mean that linker-omitted object files still end up in the DWP, which is probably pretty unfortunate for DWP file size)

Is that an option for you? If you have a list of object files (in your mapping file) you could pass those to the dwp tool directly instead?

It sounds like Bazel has this feature as well, maybe? Based on Experimental Content-Based Output Paths. Do you know how this situation is handled there?

No idea, unfortunately.

@mpark
Copy link
Member Author

mpark commented Sep 13, 2025

Having to read a file seems a bit unfortunate - could we do something that matches existing debug info features like -fdebug-prefix-map?

Hi @dwblaikie, yes, good question. For our use case, the remapping is not from one directory to another but rather on a per-file basis, which is why this takes a file of mappings per file. This has to do with content-based hashes in output paths in Buck2. Specifically, under such a model the output path of an artifact is expected to include the hash of its contents, which of course can be different for every file within a directory.

OK, thanks for the context.

Hmm - one thing we don't do at Google/my use case - is build a DWP from an executable. We build a DWP from the same list of object files that were sent to the linker. (this means the DWP can be generated at the same time as the linked executable, rather than waiting for one to do the other - it does mean that linker-omitted object files still end up in the DWP, which is probably pretty unfortunate for DWP file size)

Is that an option for you? If you have a list of object files (in your mapping file) you could pass those to the dwp tool directly instead?

Right, we did try this approach but the resulting DWP file size was indeed too big for us. There's significant size difference from the linker dropping a bunch of DWOs in the executable.

It sounds like Bazel has this feature as well, maybe? Based on Experimental Content-Based Output Paths. Do you know how this situation is handled there?

No idea, unfortunately.

Gotcha, thanks!

@mpark
Copy link
Member Author

mpark commented Sep 15, 2025

Okay, I'm closing this since we'll be moving forward with an approach to rewrite the paths in the executable instead. Thanks for the reviews!

@mpark mpark closed this Sep 15, 2025
@dwblaikie
Copy link
Collaborator

Okay, I'm closing this since we'll be moving forward with an approach to rewrite the paths in the executable instead. Thanks for the reviews!

Thanks for sharing in any case - good to know what folks are up to/challenges they're running into/use cases they have for Split DWARF. It's a bit niche, so good to have friends :)

@mpark mpark deleted the llvm-dwp-remapping branch September 17, 2025 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants