-
Notifications
You must be signed in to change notification settings - Fork 14.9k
[LLDB] Handle i686 mingw32 mangling prefix #160930
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@llvm/pr-subscribers-lldb Author: nerix (Nerixyz) ChangesOn i686 mingw32, Found in #149701 (comment). Full diff: https://github.com/llvm/llvm-project/pull/160930.diff 2 Files Affected:
diff --git a/lldb/source/Core/Mangled.cpp b/lldb/source/Core/Mangled.cpp
index 91b9c0007617d..13788e2a4992c 100644
--- a/lldb/source/Core/Mangled.cpp
+++ b/lldb/source/Core/Mangled.cpp
@@ -63,6 +63,10 @@ Mangled::ManglingScheme Mangled::GetManglingScheme(llvm::StringRef const name) {
if (name.starts_with("_Z"))
return Mangled::eManglingSchemeItanium;
+ // __Z is used on i686 mingw32
+ if (name.starts_with("__Z"))
+ return Mangled::eManglingSchemeItanium;
+
// ___Z is a clang extension of block invocations
if (name.starts_with("___Z"))
return Mangled::eManglingSchemeItanium;
diff --git a/lldb/unittests/Core/MangledTest.cpp b/lldb/unittests/Core/MangledTest.cpp
index cbc0c5d951b99..437290947cb57 100644
--- a/lldb/unittests/Core/MangledTest.cpp
+++ b/lldb/unittests/Core/MangledTest.cpp
@@ -114,6 +114,23 @@ TEST(MangledTest, SameForInvalidDLangPrefixedName) {
EXPECT_STREQ("_DDD", the_demangled.GetCString());
}
+TEST(MangledTest, ResultForValidMingw32Name) {
+ ConstString mangled_name("__Z7recursei");
+ Mangled the_mangled(mangled_name);
+ ConstString the_demangled = the_mangled.GetDemangledName();
+
+ ConstString expected_result("recurse(int)");
+ EXPECT_STREQ(expected_result.GetCString(), the_demangled.GetCString());
+}
+
+TEST(MangledTest, EmptyForInvalidMingw32Name) {
+ ConstString mangled_name("__Zzrecursei");
+ Mangled the_mangled(mangled_name);
+ ConstString the_demangled = the_mangled.GetDemangledName();
+
+ EXPECT_STREQ("", the_demangled.GetCString());
+}
+
TEST(MangledTest, RecognizeSwiftMangledNames) {
llvm::StringRef valid_swift_mangled_names[] = {
"_TtC4main7MyClass", // Mangled objc class name
|
Itanium specifies that mangled names start with _Z (and ___Z for blocks). A second leading underscore is a global symbol prefix added on some platforms (Darwin and possibly mingw?). Is it possible that we're not stripping the global prefix where we should be? I'd be wary of pretending that __Z is itanium (though it would probably be fine). Would just be good to understand the root cause of the issue |
Since the only difference in the reverted patch was adding |
I'm no real MinGW user, and I couldn't find documentation on the mangling used there, so I relied on examples. The mangled names on i686 mingw32 do have two underscores. From this comment on an old patch, it does seem like this is intended. But I can't find where Clang does this. Maybe @mstorsjo knows more? |
This is the global prefix I'm talking about: llvm-project/llvm/include/llvm/IR/DataLayout.h Lines 279 to 293 in d2c189b
Have we not tried creating But yea, having an example of the mangled names that we get from debug-info on mingw32 would be helpful |
This wasn't reverted in 185ae5c - it was the function naming.
Surprised me as well, but now that I think about it, it does make sense: |
I'm not entirely sure where that happens in the stack either, but you're right - there's a global
Exactly. The extra underscore prefix on i386 isn't mingw specific either, it's on MSVC as well - for regular C symbols. For other calling conventions (like fastcall or vectorcall) the prefix is different though, and for MSVC C++ mangled symbols, there's a different prefix. But Itanium C++ ABI on i386 works through the regular (cdecl) mangling, which adds a |
Ah, thank you for the clarification. Looking at MS' docs, I think we should instead have some preprocessing function in the PDB plugin that strips the C mangling to then pass the potentially mangled name to |
Hmm. I'm unsure which way it is best to do the layering here. I think we should be able to look at The suggested layering, which demangles cdecl, I think it's plausible that |
Right, anything that starts with a question mark (~> MS C++ name) would be ignored. Much like in LLVM: llvm-project/llvm/lib/IR/Mangler.cpp Lines 49 to 50 in 170b5fd
That's the case, and also why just checking for
Currently, there's nothing that demangles these C decorated names in I was wrong, we don't just need to do it for PDB, but also for DWARF (i.e. anything that creates |
It actually does (which is why on macOS you can pass The problem here is that LLDB tries to distinguish Itanium symbols from non-Itanium ones, and it uses E.g., on Darwin (another platform where a leading underscore is added), we don't run into this issue. I think that's because we strip the leading underscore before putting it into DWARF. But it sounds like MSVC doesn't do that. So TL;DR, happy to not do the stripping and adjust the |
I see. So just checking I don't think we need to in DWARF though? At least I haven't seen the prefix be added into DWARF |
It's only on non-64bit Windows. I just checked, for the example from above (compiled for
|
Could you elaborate the configuration here? Compiled with |
Sure, https://godbolt.org/z/Gj968q3xs shows the functions. Getting clang(-cl) to build binaries for Windows on compiler explorer is always a bit tricky, so I initially did it locally on Windows through a x86 command prompt. |
Hah, interesting, so anything with a llvm-project/clang/lib/AST/Mangle.cpp Lines 247 to 257 in 98d43ef
Other C and C++ names don't have it in DWARF though (even on Windows), as your Godbolt link demonstrates. This is quite the can of worms. Basically what ends up in DWARF as linkage names is whatever the Clang AST mangled to. But, iiuc, the global |
I didn't notice this at first, you're right. With PDB, you get these names:
My guess is that's because it's created by the linker and not the compiler.
For the time being, we could strip only the __cdecl prefix in PDB to match DWARF and open an issue for the mangling of other calling conventions. A bit related: It seems like debugging x86 executables on x86_64 Windows (WOW64) doesn't work right now, because |
Sounds reasonble. If we add good test coverage for these cases, we should be kinda free to adjust the exact implementation later anyway.
Yes, this matches my experience - the code is meant to work for x86 debugging with an x86_64 debugger, but in practice, it doesn't. For cases like these, I presume you don't actually need to test live debugging though, I guess it should be enough with just tests that load an executable and the matching debug info and inspect it? And such tests should be able to run on any platform. |
Sounds like a good compromise. Btw, @al45tair pointed out to me that mangled names can appear in multiple places in PDB. One has prefixed mangled names and the other doesnt (i think the one per module doesnt). So you have to make sure we only strip the ones that are actually prefixed |
Where else do they appear? My understanding is that mangled function (and variable) names are only present in the publics stream (also mentioned in LLVM's CodeView/PDB docs). The globals stream doesn't contain the mangled names in My approach would be to only transform the symbols from the publics stream that we read when searching for the mangled name in my original PR. |
Relands #149701 which was reverted in 185ae5c because it broke demangling of Itanium symbols on i386. The last commit in this PR adds the fix for this (discussed in #160930). On x86 environments, the prefix of `__cdecl` functions will now be removed to match DWARF. I opened #161676 to discuss this for the other calling conventions.
On i686 mingw32,
__Z
is used instead of_Z
.Mangled
didn't handle this and assumed the given string is not mangled.Found in #149701 (comment).