Skip to content

Conversation

rastogishubham
Copy link
Contributor

@rastogishubham rastogishubham commented Sep 18, 2025

If we use LTO as a driver, we can see that even if we have lambdas in
template parameters, two debug entries from different CUs will get
merged in LTO, but their human-readable DW_AT_name will be different if
we emit the locations in anonymous tag names.

Therefore we would never have two dies with the same linkage name but
different human readable names. Which probably should not happen.

This change prevents that from happening.

In our example:

We can see with AnonymousTagLocations set to true, the dwarfdump output will be

DW_AT_name ("Foo<(lambda at a.cpp:12:5){}>")

But with AnonymousTagLocations set to false, the dwarfdump output will be

DW_AT_name ("Foo<(lambda){}>")

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo labels Sep 18, 2025
@llvmbot
Copy link
Member

llvmbot commented Sep 18, 2025

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-codegen

Author: Shubham Sandeep Rastogi (rastogishubham)

Changes

If we use LTO as a driver, we can see that even if we have lambdas in
template parameters, two debug entries from different CUs will get
merged in LTO, but their human-readable DW_AT_name will be different if
we emit the locations in anonymous tag names.

Therefore we would never have two dies with the same linkage name but
different human readable names. Which probably should not happen.

This change prevents that from happening.

In our example:

We can see with AnonymousTagLocations set to true, the dwarfdump output will be

DW_AT_name ("Foo&lt;(lambda at a.cpp:12:5){}&gt;")

But with AnonymousTagLocations set to true, the dwarfdump output will be

DW_AT_name ("Foo&lt;(lambda){}&gt;")


Full diff: https://github.com/llvm/llvm-project/pull/159592.diff

2 Files Affected:

  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+2)
  • (added) clang/test/DebugInfo/CXX/anonymous-locs.cpp (+14)
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 578d09f7971d6..9df792be1f283 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -397,6 +397,8 @@ PrintingPolicy CGDebugInfo::getPrintingPolicy() const {
 
   // Apply -fdebug-prefix-map.
   PP.Callbacks = &PrintCB;
+  // Disable printing of location of an anonymous tag name.
+  PP.AnonymousTagLocations = false;
   return PP;
 }
 
diff --git a/clang/test/DebugInfo/CXX/anonymous-locs.cpp b/clang/test/DebugInfo/CXX/anonymous-locs.cpp
new file mode 100644
index 0000000000000..48a8aff57d520
--- /dev/null
+++ b/clang/test/DebugInfo/CXX/anonymous-locs.cpp
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -std=c++20 -emit-obj -debug-info-kind=standalone -dwarf-version=5 -triple x86_64-apple-darwin -o %t %s 
+// RUN: llvm-dwarfdump %t | FileCheck %s
+
+// CHECK: DW_TAG_structure_type
+// CHECK-NEXT: DW_AT_calling_convention	(DW_CC_pass_by_value)
+// CHECK-NEXT: DW_AT_name	("Foo<(lambda){}>")
+
+template<auto T>
+struct Foo {
+};
+
+Foo<[] {}> f;
+
+auto func() { return f; }
\ No newline at end of file

If we use LTO as a driver, we can see that even if we have lambdas in
template parameters, two debug entries from different CUs will get
merged in LTO, but their human-readable DW_AT_name will be different if
we emit the locations in anonymous tag names.

Therefore we would never have two dies with the same linkage name but
different human readable names. Which probably should not happen.

This change prevents that from happening.
@dwblaikie
Copy link
Collaborator

Any chance we could use a naming that matches/is similar to the mangling? Including the lambda numbering - could help ensure the names are unique-ish? - bonus points if the same name could be generated by just looking at the DWARF for the lambda type itself... I suspect you'd have to include the lambda numbering in the lambda to make that work nicely/without having to count lambdas in the enclosing scope or something.

There's some old discussion I had in LLVM and dwarf-discuss about the problems with lambda naming - these naming problems are the reason I couldn't include templates with lambda parameters in simplified-template-names (they get unsimplified names instead). https://lists.dwarfstd.org/pipermail/dwarf-discuss/2022-January/002102.html

@dwblaikie
Copy link
Collaborator

Also - maybe compare to GCC's naming? (could be nice to converge, but I'm not sure that's possible/might be in tension with matching demangling)
& I don't /think/ this needs a flag, especially if we move towards unique-like-mangling names

@dwblaikie
Copy link
Collaborator

Oh, and did you hit some particular problem with this varied naming situation? It's not immediately obvious to me that this would be a problem - one version might be more or less confusing to the user as the other, but most of the differences I saw in (an admittedly very highly coherent build system) were "./foo.h" versus "foo.h" - in which case either would've been fine for the user to read/understand what was being referenced.

@rastogishubham
Copy link
Contributor Author

Hi @dwblaikie

Also - maybe compare to GCC's naming? (could be nice to converge, but I'm not sure that's possible/might be in tension with matching demangling)

We actually did compare to what GCC does, and you can see with this compiler explorer link that GCC doesn't add the location to the DW_AT_name cc @Michael137 for the example (thanks for providing that!)

Oh, and did you hit some particular problem with this varied naming situation? It's not immediately obvious to me that this would be a problem

Yes, we had an issue with dsymutil, we saw varifiction error in the DWARFVerifier when trying to verify the contents of the Accelerator tables. Since dsymutil does type uniquing similar to LTO.

Any chance we could use a naming that matches/is similar to the mangling? Including the lambda numbering - could help ensure the names are unique-ish?

Correct me if I am wrong, but if we use the mangling, whats the difference between the DW_AT_name and the DW_AT_linkage_name?

@dwblaikie
Copy link
Collaborator

dwblaikie commented Sep 19, 2025

Hi @dwblaikie

Also - maybe compare to GCC's naming? (could be nice to converge, but I'm not sure that's possible/might be in tension with matching demangling)

We actually did compare to what GCC does, and you can see with this compiler explorer link that GCC doesn't add the location to the DW_AT_name cc @Michael137 for the example (thanks for providing that!)

nod So GCC does make some extra effort to make these things a bit more unique by including the scope (though it fails for the inline variable - I /think/ they should be scoped to the inline variable too, since the type does have an ODR/linkage name, the lambdas should be deduplicable across translation units, etc) and the argument types (wonder how that works if the argument types are template parameters... that can get complicated (hmm, "tfunc<<lambda(auto:1)> >" it seems, for a basic case)): https://godbolt.org/z/4M1nfT396

template <typename T>
void tfunc(const T&) {}
inline void x() {
    tfunc([] {});       // GCC: tfunc<x()::<lambda()> >
                        // Dem: tfunc<x()::'lambda'()>
    tfunc([] {});       // GCC: tfunc<x()::<lambda()> >
                        // Dem: tfunc<x()::'lambda0'()>
    tfunc([](int) {});  // GCC: tfunc<x()::<lambda(int)> >
                        // Dem: tfunc<x()::'lambda'(int)>
}
inline int i = (tfunc([] {}),       // GCC: tfunc<<lambda()> >
                                    // Dem: tfunc<i::'lambda'()>
                tfunc([] {}),       // GCC: tfunc<<lambda()> >
                                    // Dem: tfunc<i::'lambda0'()>
                tfunc([](int) {}),  // GCC: tfunc<<lambda(int)> >
                                    // Dem:  tfunc<i::'lambda'(int)>
                1);
void y() {
    x();
    (void)i;
    tfunc([] {});       // GCC: tfunc<y()::<lambda()> >
                        // Dem: tfunc<y()::'lambda'()>
    tfunc([] {});       // GCC: tfunc<y()::<lambda()> >
                        // Dem: tfunc<y()::'lambda0'()>
    tfunc([](int) {});  // GCC: tfunc<y()::<lambda(int)> >
                        // Dem: tfunc<y()::'lambda'(int)>
}

It'd be nice to do at least that, but I'd argue we should do a bit more - by including the mangling number in them too.

Oh, and did you hit some particular problem with this varied naming situation? It's not immediately obvious to me that this would be a problem

Yes, we had an issue with dsymutil, we saw varifiction error in the DWARFVerifier when trying to verify the contents of the Accelerator tables. Since dsymutil does type uniquing similar to LTO.

Ah, alrighty. New verification checks you're working on/adding, I take it? Good to know/hear about/commendable - I'm sure it's a lot of little things to track down/try to fix, but should make the DWARF more robust/reliable/deterministic/expected.

Any chance we could use a naming that matches/is similar to the mangling? Including the lambda numbering - could help ensure the names are unique-ish?

Correct me if I am wrong, but if we use the mangling, whats the difference between the DW_AT_name and the DW_AT_linkage_name?

Ideally, not much should be different, imho. Eventually we should/could ask the question of whether the linkage name is redundant (ideally the linkage name would be the one we remove, though I realize that places a fairly substantial burden on consumers to reconstruct names in a non-trivial way - but the upside is that DWARF descriptions have a lot of redundancy elimination in them (structurally, think about an exponential template expansion "T<T<T<>, T<>>, T<T<>, T<>>>" - DWARF contains no duplication itself in the structural description - with simplified template names it contains no duplication in the DW_AT_name, and the mangled name can only deduplicate within itself - not across names (so in the DWARF where you have the linkage name for "T<>", and for "T<T<>, T<>>" and for the outer "T<...>" there's no ability for the latter to reuse the former - but the DWARF references back to those other DIEs just fine))).

(@rnk for FYI on this thread in general, and some ideas here)

But even without that somewhat fanciful future world without linkage names in DWARF - notionally the DW_AT_name is a name fragment that's immediately usable without demangling, etc. So I think it still makes sense to include a name that might match the demangling (most simple names do - we include the name of function "foo" despite the fact it's identical to the name you could get from demangling, even in more complex template examples that's often still true) and it seems important to end users that they have a good chance at identifying which lambda is being referenced - which means having /at least/ as much uniqueness as the mangling. And I agree with you that it's important that names don't have /more/ uniqueness than the mangling.

So the conclusion that it should have /exactly as much/ uniqueness as the mangling - which, to me, pushes pretty close to printing out something pretty close to identical to the demangling (note, I think there are demanglings that aren't unique - where multiple mangled names demangle to the same name - so perhaps even with the demangling matching motivation we might not quite reach the "have at least as much uniqueness as the mangling" property I described above)

@rnk
Copy link
Collaborator

rnk commented Sep 19, 2025

I think coming up with a scope chain like @dwblaikie described would probably be the best outcome because it is resilient to source code drift, but another possible alternative to consider would be to use the file basename with canonicalized case for case-insensitive file systems, plus the line:col. This assumes that most of the verifier failures you are encountering are driven by header file search path mismatches and path traversal component matches (/foo/bar.h:123 vs /foo/../foo/bar.h:123), but there's also case-insensitive filesystems to consider.

The downside to this approach is that it is fragile in the presence of source code drift, i.e. if I add a comment block to a shared header and recompile half my application, I could get debug info verifier check failures. This is not technically an ODR violation, but it is a kind of build system dependency tracking failure, but it depends where you want to make the tradeoff between debuggability, effort, and verification reliability.

@rnk
Copy link
Collaborator

rnk commented Sep 19, 2025

I should add, I agree with the goal of removing the full filepaths from the type names. They are usually quite long, and in my experience, they usually clutter profiling and debugger tool output. There could be significant debug string size savings from this change as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category debuginfo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants