Skip to content

Conversation

@Jinjie-Huang
Copy link
Contributor

@Jinjie-Huang Jinjie-Huang commented Dec 27, 2024

The current behavior of binutils dwp when encountering duplicate DWO inputs is to issue a warning and continue generating the DWP. like this:

dwp a.dwo a.dwo -o x.dwp
dwp: warning: xx.dwp: duplicate entry for CU (dwo_id xxx)

However, llvm-dwp uses a map structure, which cannot store duplicate entries, so it throws an error and exits when faced with such scenarios.

llvm-dwp a.dwo a.dwo -o x.dwp
error: duplicate DWO ID (xx) in 'a.c' and 'a.c'

Therefore, this PR attempts to skip the duplicate DWO inputs in such case, and generate the dwp.

@llvmbot
Copy link
Member

llvmbot commented Dec 27, 2024

@llvm/pr-subscribers-debuginfo

Author: Jinjie Huang (Labman-001)

Changes

The current behavior of binutils dwp when encountering duplicate DWO inputs is to issue a warning and continue generating the DWP. like this:

dwp a.dwo a.dwo -o x.dwp
dwp: warning: xx.dwp: duplicate entry for CU (dwo_id xxx)

However, llvm-dwp uses a Map structure, which cannot store duplicate entries, so it throws an error and exits when faced with such scenarios.

llvm-dwp a.dwo a.dwo -o x.dwp
error: duplicate DWO ID (xx) in 'a.c' and 'a.c'

Can we consider skipping the duplicate DWO inputs in such cases, issuing a warning instead, and continuing to generate the DWP? And it seems unnecessary to provide an option to maintain the error and exit behavior?


Full diff: https://github.com/llvm/llvm-project/pull/121193.diff

4 Files Affected:

  • (modified) llvm/lib/DWP/DWP.cpp (+9-4)
  • (modified) llvm/test/tools/llvm-dwp/X86/duplicate.test (+12-12)
  • (modified) llvm/test/tools/llvm-dwp/X86/gcc_type.test (+1-1)
  • (modified) llvm/test/tools/llvm-dwp/X86/handle_strx.test (+2-2)
diff --git a/llvm/lib/DWP/DWP.cpp b/llvm/lib/DWP/DWP.cpp
index fecd184ca68a86..52bf062247d31c 100644
--- a/llvm/lib/DWP/DWP.cpp
+++ b/llvm/lib/DWP/DWP.cpp
@@ -786,8 +786,11 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
               return createFileError(Input, EID.takeError());
             const auto &ID = *EID;
             auto P = IndexEntries.insert(std::make_pair(ID.Signature, Entry));
-            if (!P.second)
-              return buildDuplicateError(*P.first, ID, "");
+            if (!P.second) {
+              WithColor::defaultWarningHandler(buildDuplicateError(*P.first, ID, ""));
+              FoundCUUnit = true;
+              continue;
+            }
             P.first->second.Name = ID.Name;
             P.first->second.DWOName = ID.DWOName;
 
@@ -858,8 +861,10 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
       if (!EID)
         return createFileError(Input, EID.takeError());
       const auto &ID = *EID;
-      if (!P.second)
-        return buildDuplicateError(*P.first, ID, Input);
+      if (!P.second) {
+        WithColor::defaultWarningHandler(buildDuplicateError(*P.first, ID, Input));
+        continue;
+      }
       auto &NewEntry = P.first->second;
       NewEntry.Name = ID.Name;
       NewEntry.DWOName = ID.DWOName;
diff --git a/llvm/test/tools/llvm-dwp/X86/duplicate.test b/llvm/test/tools/llvm-dwp/X86/duplicate.test
index 66eede9b5d2df7..e2049901de1600 100644
--- a/llvm/test/tools/llvm-dwp/X86/duplicate.test
+++ b/llvm/test/tools/llvm-dwp/X86/duplicate.test
@@ -1,27 +1,27 @@
-RUN: not llvm-dwp %p/../Inputs/duplicate/c.dwo %p/../Inputs/duplicate/c.dwo -o %t 2>&1 \
+RUN: llvm-dwp %p/../Inputs/duplicate/c.dwo %p/../Inputs/duplicate/c.dwo -o %t 2>&1 \
 RUN:   | FileCheck --check-prefix=DWOS %s
 
-RUN: not llvm-dwp %p/../Inputs/duplicate/c.dwo %p/../Inputs/duplicate/bc.dwp -o %t 2>&1 \
+RUN: llvm-dwp %p/../Inputs/duplicate/c.dwo %p/../Inputs/duplicate/bc.dwp -o %t 2>&1 \
 RUN:   | FileCheck --check-prefix=2DWP %s
 
-RUN: not llvm-dwp %p/../Inputs/duplicate/ac.dwp %p/../Inputs/duplicate/c.dwo -o %t 2>&1 \
+RUN: llvm-dwp %p/../Inputs/duplicate/ac.dwp %p/../Inputs/duplicate/c.dwo -o %t 2>&1 \
 RUN:   | FileCheck --check-prefix=1DWP %s
 
-RUN: not llvm-dwp %p/../Inputs/duplicate_dwo_name/c.dwo %p/../Inputs/duplicate_dwo_name/c.dwo -o %t 2>&1 \
+RUN: llvm-dwp %p/../Inputs/duplicate_dwo_name/c.dwo %p/../Inputs/duplicate_dwo_name/c.dwo -o %t 2>&1 \
 RUN:   | FileCheck --check-prefix=DWODWOS %s
 
-RUN: not llvm-dwp %p/../Inputs/duplicate_dwo_name/c.dwo %p/../Inputs/duplicate_dwo_name/bc.dwp -o %t 2>&1 \
+RUN: llvm-dwp %p/../Inputs/duplicate_dwo_name/c.dwo %p/../Inputs/duplicate_dwo_name/bc.dwp -o %t 2>&1 \
 RUN:   | FileCheck --check-prefix=DWO2DWP %s
 
-RUN: not llvm-dwp %p/../Inputs/duplicate_dwo_name/ac.dwp %p/../Inputs/duplicate_dwo_name/c.dwo -o %t 2>&1 \
+RUN: llvm-dwp %p/../Inputs/duplicate_dwo_name/ac.dwp %p/../Inputs/duplicate_dwo_name/c.dwo -o %t 2>&1 \
 RUN:   | FileCheck --check-prefix=DWO1DWP %s
 
 Build from a, b, and c.c all containing a single void() func by the name of the file.
 
-DWOS: error: duplicate DWO ID ({{.*}}) in 'c.c' and 'c.c'{{$}}
-1DWP: error: duplicate DWO ID ({{.*}}) in 'c.c' (from '{{.*}}ac.dwp') and 'c.c'{{$}}
-2DWP: error: duplicate DWO ID ({{.*}}) in 'c.c' and 'c.c' (from '{{.*}}bc.dwp'){{$}}
+DWOS: warning: duplicate DWO ID ({{.*}}) in 'c.c' and 'c.c'{{$}}
+1DWP: warning: duplicate DWO ID ({{.*}}) in 'c.c' (from '{{.*}}ac.dwp') and 'c.c'{{$}}
+2DWP: warning: duplicate DWO ID ({{.*}}) in 'c.c' and 'c.c' (from '{{.*}}bc.dwp'){{$}}
 
-DWODWOS: error: duplicate DWO ID ({{.*}}) in 'c.c' (from 'c.dwo') and 'c.c' (from 'c.dwo'){{$}}
-DWO1DWP: error: duplicate DWO ID ({{.*}}) in 'c.c' (from 'c.dwo' in '{{.*}}ac.dwp') and 'c.c' (from 'c.dwo'){{$}}
-DWO2DWP: error: duplicate DWO ID ({{.*}}) in 'c.c' (from 'c.dwo') and 'c.c' (from 'c.dwo' in '{{.*}}bc.dwp'){{$}}
+DWODWOS: warning: duplicate DWO ID ({{.*}}) in 'c.c' (from 'c.dwo') and 'c.c' (from 'c.dwo'){{$}}
+DWO1DWP: warning: duplicate DWO ID ({{.*}}) in 'c.c' (from 'c.dwo' in '{{.*}}ac.dwp') and 'c.c' (from 'c.dwo'){{$}}
+DWO2DWP: warning: duplicate DWO ID ({{.*}}) in 'c.c' (from 'c.dwo') and 'c.c' (from 'c.dwo' in '{{.*}}bc.dwp'){{$}}
diff --git a/llvm/test/tools/llvm-dwp/X86/gcc_type.test b/llvm/test/tools/llvm-dwp/X86/gcc_type.test
index eb8f2ba9fd3740..1130fc6d88d2b1 100644
--- a/llvm/test/tools/llvm-dwp/X86/gcc_type.test
+++ b/llvm/test/tools/llvm-dwp/X86/gcc_type.test
@@ -1,5 +1,5 @@
 RUN: llvm-dwp %p/../Inputs/gcc_type/a.dwo -o - | llvm-dwarfdump -v - | FileCheck %s
-RUN: not llvm-dwp %p/../Inputs/gcc_type/a.dwo %p/../Inputs/gcc_type/a.dwo -o %t 2>&1 | FileCheck --check-prefix=DUP %s
+RUN: llvm-dwp %p/../Inputs/gcc_type/a.dwo %p/../Inputs/gcc_type/a.dwo -o %t 2>&1 | FileCheck --check-prefix=DUP %s
 
 CHECK: Type Unit
 CHECK: Type Unit
diff --git a/llvm/test/tools/llvm-dwp/X86/handle_strx.test b/llvm/test/tools/llvm-dwp/X86/handle_strx.test
index c62f923f30e228..3df5447a50c8f8 100644
--- a/llvm/test/tools/llvm-dwp/X86/handle_strx.test
+++ b/llvm/test/tools/llvm-dwp/X86/handle_strx.test
@@ -1,7 +1,7 @@
 RUN: llvm-dwp %p/../Inputs/handle_strx/dw5.dwo -o %t 2>/dev/null
 RUN: llvm-dwarfdump --verbose %t 2>/dev/null | FileCheck --check-prefix=READ_STRX %s
 
-RUN: not llvm-dwp %p/../Inputs/handle_strx/dw5.dwo %p/../Inputs/handle_strx/dw5.dwo -o %t 2>&1 \
+RUN: llvm-dwp %p/../Inputs/handle_strx/dw5.dwo %p/../Inputs/handle_strx/dw5.dwo -o %t 2>&1 \
 RUN:   | FileCheck --check-prefix=PARSE_STRX %s
 
 
@@ -10,5 +10,5 @@ with options -gdwarf-5 and -gsplit-dwarf.
 
 READ_STRX: DW_AT_name [DW_FORM_strx1]{{.*}}dw5.cc
 
-PARSE_STRX: error: duplicate DWO ID ({{.*}}) in 'dw5.cc' (from 'dw5.dwo') and 'dw5.cc' (from 'dw5.dwo')
+PARSE_STRX: warning: duplicate DWO ID ({{.*}}) in 'dw5.cc' (from 'dw5.dwo') and 'dw5.cc' (from 'dw5.dwo')
 

@github-actions
Copy link

github-actions bot commented Dec 27, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@Jinjie-Huang
Copy link
Contributor Author

@dwblaikie @ayermolo, can you help to review? thanks

@ayermolo
Copy link
Contributor

Is there a case for it to be a warning, other then binutils does it?
One way of looking at it, this will produce incomplete, broken(?), debug information, and in big build systems people just ignore warnings.

@Jinjie-Huang
Copy link
Contributor Author

Is there a case for it to be a warning, other then binutils does it?

One case is when multiple .a files in a project contain duplicate .o files. The linker can link them normally, but llvm-dwp reports an error during the process, which is somewhat counterintuitive. In such a scenario, it might be better to generate a DWP like binutils dwp does?

One way of looking at it, this will produce incomplete, broken(?), debug information,

Those duplicate DWOs seem to be redundant when generating DWP?
Regarding skipping duplicate DWO IDs, I tested the generated DWP with GDB (13.2) and LLDB (7.0), and they can consume the DWP normally.

And btw, it seems we don't need to add an option to control the behavior when encountering duplicate DWOs?

@dwblaikie
Copy link
Collaborator

Yeah, pretty much every duplicate DWO ID issue I've tracked down inside google were ultimately benign as far as the DWARF structure is concerned - the same file compiled/linked in twice, and the linker discards one but the dwp actions don't know how to do that (if you're using dwp -e you probably wouldn't hit these cases, since the redundant objects would be discarded by the linker)

Some cases might confuse a consumer - I think I saw some cases where /nearly/ empty files ended up with duplicate DWO IDs (totally empty files shouldn't produce a CU so there wouldn't be any duplicate/collission issues. Imagine some code like this: https://godbolt.org/z/qxf1x9bW9 - if there were two files with the same file name (in different directories, or compiled twice with different -D in each compilation) but different contents in init() they'd still get the same DWO ID, despite the fact that they're really distinct units, and could share the DWO contents just fine (interpreted with different addresses, etc, provided by the skeleton unit) - I'm guessing current DWARF consumers don't handle this, and just assume it's a bug/mistake/duplicate skeleton unit if two have the same DWO ID. (this example would only work if the objects were force-linked (specified directly on the command line), not on-demand linked (as they would be if they were part of a static archive))

@Jinjie-Huang
Copy link
Contributor Author

Jinjie-Huang commented Dec 31, 2024

the same file compiled/linked in twice, and the linker discards one but the dwp actions don't know how to do that (if you're using dwp -e you probably wouldn't hit these cases, since the redundant objects would be discarded by the linker)

Yes, the normal duplicate .cpp should be discarded when linking as mentioned. What's interesting here is the real case we met: multiple .a files have the same .o files (which only contain declarations and are linked using whole-archive, I know this is a bit inappropriate), causing the linker to link all the duplicate .o files in the main program. Using llvm-dwp -e will result in an error in this case because of the duplicate skeletons. So it seems there is a possibility of such skeleton duplication...

I'm guessing current DWARF consumers don't handle this, and just assume it's a bug/mistake/duplicate skeleton unit if two have the same DWO ID.

Do we have any thoughts on this? Should we consider skipping the duplicate DWOs and then generating the DWP(better than nothing?), or packaging the duplicate DWOs in the DWP (what binutil dwp does), or continue with the current approach of reporting an error and then exiting? In the current llvm-dwp implementation, skipping the duplicate DWOs and generating a DWP seems to be more convenient.

@ayermolo
Copy link
Contributor

@clayborg Any thoughts on this?

Maybe leave default an error, with an option to turn it into a warning and skipping?
I don't see the benefit of having two dwo IDS in the cu-index. Also I don't think cu-index algorithm it self allows for duplicates: https://gcc.gnu.org/wiki/DebugFissionDWP

@dwblaikie
Copy link
Collaborator

the same file compiled/linked in twice, and the linker discards one but the dwp actions don't know how to do that (if you're using dwp -e you probably wouldn't hit these cases, since the redundant objects would be discarded by the linker)

Yes, the normal duplicate .cpp should be discarded when linking as mentioned. What's interesting here is the real case we met: multiple .a files have the same .o files (which only contain declarations and are linked using whole-archive, I know this is a bit inappropriate), causing the linker to link all the duplicate .o files in the main program. Using llvm-dwp -e will result in an error in this case because of the duplicate skeletons. So it seems there is a possibility of such skeleton duplication...

Ah, that's probably even easier to handle, then (compared to the duplicate DWO IDs that happen when building DWP files without -e, instead using a list of the objects passed to the linker, or worse - when linking DWP files incrementally, when the input is another DWP file which might have CUs and TUs, and the CU is duplicate but the TUs may be needed? (I guess that shouldn't happen, though - if the DWO ID collision is benign, the TUs should all already be included - perhaps that's something we could slice out of the warning/error))

In the -e case, I think if the CU ID is duplicate, ignore the whole object file - seems fine to me. Could have a warning-upgradeable-to-error, or error-downgradeable-to-warning. I don't have strong feelings either way - all the cases I've seen were benign (genuinely identical DWO files, not a hash collision from two different but same-hashing DWO files)

Perhaps "same DWO ID with different contents" could be an error by default (downgradeable to a warning) - but means more expensive checking when duplicate DWO IDs are encountered, even if they turn out to be benign.

I'm guessing current DWARF consumers don't handle this, and just assume it's a bug/mistake/duplicate skeleton unit if two have the same DWO ID.

Do we have any thoughts on this? Should we consider skipping the duplicate DWOs and then generating the DWP(better than nothing?), or packaging the duplicate DWOs in the DWP (what binutil dwp does), or continue with the current approach of reporting an error and then exiting? In the current llvm-dwp implementation, skipping the duplicate DWOs and generating a DWP seems to be more convenient.

We should skip them - but we have to do so carefully.

If we're doing incremental DWP building, then one DWP might have a duplicate CU DWO ID, but we need to verify that all the split type units that reference the sections shared with that CU DWO ID aren't needed either. (eg: .debug_str_offsets.dwo)

Maybe we've done this correctly already, since this would be needed for type units that share sections - but I might've been lazy/cut corners and relied on the CU already pulling in the relevant sections. One way to test would be to see if llvm-dwp can handle a file with /only/ type units, and correctly pull in the section contents and emit it into the dwp. If that doesn't work, then there's probably work to be done to allow ignoring a CU and not ignoring a TU that references the same contents - or checking and failing in that case would be fine too/marginally better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants