-
Notifications
You must be signed in to change notification settings - Fork 14.9k
Do not print locations in anonymous tag names. #159592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-debuginfo @llvm/pr-subscribers-clang-codegen Author: Shubham Sandeep Rastogi (rastogishubham) ChangesIf we use LTO as a driver, we can see that even if we have lambdas in Therefore we would never have two dies with the same linkage name but This change prevents that from happening. In our example: We can see with
But with
Full diff: https://github.com/llvm/llvm-project/pull/159592.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index 578d09f7971d6..9df792be1f283 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -397,6 +397,8 @@ PrintingPolicy CGDebugInfo::getPrintingPolicy() const {
// Apply -fdebug-prefix-map.
PP.Callbacks = &PrintCB;
+ // Disable printing of location of an anonymous tag name.
+ PP.AnonymousTagLocations = false;
return PP;
}
diff --git a/clang/test/DebugInfo/CXX/anonymous-locs.cpp b/clang/test/DebugInfo/CXX/anonymous-locs.cpp
new file mode 100644
index 0000000000000..48a8aff57d520
--- /dev/null
+++ b/clang/test/DebugInfo/CXX/anonymous-locs.cpp
@@ -0,0 +1,14 @@
+// RUN: %clang_cc1 -std=c++20 -emit-obj -debug-info-kind=standalone -dwarf-version=5 -triple x86_64-apple-darwin -o %t %s
+// RUN: llvm-dwarfdump %t | FileCheck %s
+
+// CHECK: DW_TAG_structure_type
+// CHECK-NEXT: DW_AT_calling_convention (DW_CC_pass_by_value)
+// CHECK-NEXT: DW_AT_name ("Foo<(lambda){}>")
+
+template<auto T>
+struct Foo {
+};
+
+Foo<[] {}> f;
+
+auto func() { return f; }
\ No newline at end of file
|
If we use LTO as a driver, we can see that even if we have lambdas in template parameters, two debug entries from different CUs will get merged in LTO, but their human-readable DW_AT_name will be different if we emit the locations in anonymous tag names. Therefore we would never have two dies with the same linkage name but different human readable names. Which probably should not happen. This change prevents that from happening.
378959f
to
549f627
Compare
Any chance we could use a naming that matches/is similar to the mangling? Including the lambda numbering - could help ensure the names are unique-ish? - bonus points if the same name could be generated by just looking at the DWARF for the lambda type itself... I suspect you'd have to include the lambda numbering in the lambda to make that work nicely/without having to count lambdas in the enclosing scope or something. There's some old discussion I had in LLVM and dwarf-discuss about the problems with lambda naming - these naming problems are the reason I couldn't include templates with lambda parameters in simplified-template-names (they get unsimplified names instead). https://lists.dwarfstd.org/pipermail/dwarf-discuss/2022-January/002102.html |
Also - maybe compare to GCC's naming? (could be nice to converge, but I'm not sure that's possible/might be in tension with matching demangling) |
Oh, and did you hit some particular problem with this varied naming situation? It's not immediately obvious to me that this would be a problem - one version might be more or less confusing to the user as the other, but most of the differences I saw in (an admittedly very highly coherent build system) were "./foo.h" versus "foo.h" - in which case either would've been fine for the user to read/understand what was being referenced. |
Hi @dwblaikie
We actually did compare to what GCC does, and you can see with this compiler explorer link that GCC doesn't add the location to the DW_AT_name cc @Michael137 for the example (thanks for providing that!)
Yes, we had an issue with dsymutil, we saw varifiction error in the DWARFVerifier when trying to verify the contents of the Accelerator tables. Since dsymutil does type uniquing similar to LTO.
Correct me if I am wrong, but if we use the mangling, whats the difference between the DW_AT_name and the DW_AT_linkage_name? |
nod So GCC does make some extra effort to make these things a bit more unique by including the scope (though it fails for the inline variable - I /think/ they should be scoped to the inline variable too, since the type does have an ODR/linkage name, the lambdas should be deduplicable across translation units, etc) and the argument types (wonder how that works if the argument types are template parameters... that can get complicated (hmm, "tfunc<<lambda(auto:1)> >" it seems, for a basic case)): https://godbolt.org/z/4M1nfT396
It'd be nice to do at least that, but I'd argue we should do a bit more - by including the mangling number in them too.
Ah, alrighty. New verification checks you're working on/adding, I take it? Good to know/hear about/commendable - I'm sure it's a lot of little things to track down/try to fix, but should make the DWARF more robust/reliable/deterministic/expected.
Ideally, not much should be different, imho. Eventually we should/could ask the question of whether the linkage name is redundant (ideally the linkage name would be the one we remove, though I realize that places a fairly substantial burden on consumers to reconstruct names in a non-trivial way - but the upside is that DWARF descriptions have a lot of redundancy elimination in them (structurally, think about an exponential template expansion "T<T<T<>, T<>>, T<T<>, T<>>>" - DWARF contains no duplication itself in the structural description - with simplified template names it contains no duplication in the DW_AT_name, and the mangled name can only deduplicate within itself - not across names (so in the DWARF where you have the linkage name for "T<>", and for "T<T<>, T<>>" and for the outer "T<...>" there's no ability for the latter to reuse the former - but the DWARF references back to those other DIEs just fine))). (@rnk for FYI on this thread in general, and some ideas here) But even without that somewhat fanciful future world without linkage names in DWARF - notionally the DW_AT_name is a name fragment that's immediately usable without demangling, etc. So I think it still makes sense to include a name that might match the demangling (most simple names do - we include the name of function "foo" despite the fact it's identical to the name you could get from demangling, even in more complex template examples that's often still true) and it seems important to end users that they have a good chance at identifying which lambda is being referenced - which means having /at least/ as much uniqueness as the mangling. And I agree with you that it's important that names don't have /more/ uniqueness than the mangling. So the conclusion that it should have /exactly as much/ uniqueness as the mangling - which, to me, pushes pretty close to printing out something pretty close to identical to the demangling (note, I think there are demanglings that aren't unique - where multiple mangled names demangle to the same name - so perhaps even with the demangling matching motivation we might not quite reach the "have at least as much uniqueness as the mangling" property I described above) |
I think coming up with a scope chain like @dwblaikie described would probably be the best outcome because it is resilient to source code drift, but another possible alternative to consider would be to use the file basename with canonicalized case for case-insensitive file systems, plus the line:col. This assumes that most of the verifier failures you are encountering are driven by header file search path mismatches and path traversal component matches ( The downside to this approach is that it is fragile in the presence of source code drift, i.e. if I add a comment block to a shared header and recompile half my application, I could get debug info verifier check failures. This is not technically an ODR violation, but it is a kind of build system dependency tracking failure, but it depends where you want to make the tradeoff between debuggability, effort, and verification reliability. |
I should add, I agree with the goal of removing the full filepaths from the type names. They are usually quite long, and in my experience, they usually clutter profiling and debugger tool output. There could be significant debug string size savings from this change as well. |
If we use LTO as a driver, we can see that even if we have lambdas in
template parameters, two debug entries from different CUs will get
merged in LTO, but their human-readable DW_AT_name will be different if
we emit the locations in anonymous tag names.
Therefore we would never have two dies with the same linkage name but
different human readable names. Which probably should not happen.
This change prevents that from happening.
In our example:
We can see with
AnonymousTagLocations
set to true, the dwarfdump output will beDW_AT_name ("Foo<(lambda at a.cpp:12:5){}>")
But with
AnonymousTagLocations
set to false, the dwarfdump output will beDW_AT_name ("Foo<(lambda){}>")