Skip to content

Conversation

Nerixyz
Copy link
Contributor

@Nerixyz Nerixyz commented Sep 26, 2025

On i686 mingw32, __Z is used instead of _Z. Mangled didn't handle this and assumed the given string is not mangled.

Found in #149701 (comment).

@llvmbot
Copy link
Member

llvmbot commented Sep 26, 2025

@llvm/pr-subscribers-lldb

Author: nerix (Nerixyz)

Changes

On i686 mingw32, __Z is used instead of _Z. Mangled didn't handle this and assumed the given string is not mangled.

Found in #149701 (comment).


Full diff: https://github.com/llvm/llvm-project/pull/160930.diff

2 Files Affected:

  • (modified) lldb/source/Core/Mangled.cpp (+4)
  • (modified) lldb/unittests/Core/MangledTest.cpp (+17)
diff --git a/lldb/source/Core/Mangled.cpp b/lldb/source/Core/Mangled.cpp
index 91b9c0007617d..13788e2a4992c 100644
--- a/lldb/source/Core/Mangled.cpp
+++ b/lldb/source/Core/Mangled.cpp
@@ -63,6 +63,10 @@ Mangled::ManglingScheme Mangled::GetManglingScheme(llvm::StringRef const name) {
   if (name.starts_with("_Z"))
     return Mangled::eManglingSchemeItanium;
 
+  // __Z is used on i686 mingw32
+  if (name.starts_with("__Z"))
+    return Mangled::eManglingSchemeItanium;
+
   // ___Z is a clang extension of block invocations
   if (name.starts_with("___Z"))
     return Mangled::eManglingSchemeItanium;
diff --git a/lldb/unittests/Core/MangledTest.cpp b/lldb/unittests/Core/MangledTest.cpp
index cbc0c5d951b99..437290947cb57 100644
--- a/lldb/unittests/Core/MangledTest.cpp
+++ b/lldb/unittests/Core/MangledTest.cpp
@@ -114,6 +114,23 @@ TEST(MangledTest, SameForInvalidDLangPrefixedName) {
   EXPECT_STREQ("_DDD", the_demangled.GetCString());
 }
 
+TEST(MangledTest, ResultForValidMingw32Name) {
+  ConstString mangled_name("__Z7recursei");
+  Mangled the_mangled(mangled_name);
+  ConstString the_demangled = the_mangled.GetDemangledName();
+
+  ConstString expected_result("recurse(int)");
+  EXPECT_STREQ(expected_result.GetCString(), the_demangled.GetCString());
+}
+
+TEST(MangledTest, EmptyForInvalidMingw32Name) {
+  ConstString mangled_name("__Zzrecursei");
+  Mangled the_mangled(mangled_name);
+  ConstString the_demangled = the_mangled.GetDemangledName();
+
+  EXPECT_STREQ("", the_demangled.GetCString());
+}
+
 TEST(MangledTest, RecognizeSwiftMangledNames) {
   llvm::StringRef valid_swift_mangled_names[] = {
       "_TtC4main7MyClass",   // Mangled objc class name

@Michael137
Copy link
Member

Itanium specifies that mangled names start with _Z (and ___Z for blocks). A second leading underscore is a global symbol prefix added on some platforms (Darwin and possibly mingw?). Is it possible that we're not stripping the global prefix where we should be? I'd be wary of pretending that __Z is itanium (though it would probably be fine). Would just be good to understand the root cause of the issue

@Michael137
Copy link
Member

Michael137 commented Sep 26, 2025

Since the only difference in the reverted patch was adding AsmLabels to the PDB decls, I'd be curious to see what those AsmLabels look like on mingw32?

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Sep 26, 2025

I'm no real MinGW user, and I couldn't find documentation on the mangling used there, so I relied on examples. The mangled names on i686 mingw32 do have two underscores. From this comment on an old patch, it does seem like this is intended. But I can't find where Clang does this. Maybe @mstorsjo knows more?

@Michael137
Copy link
Member

I'm no real MinGW user, and I couldn't find documentation on the mangling used there, so I relied on examples. The mangled names on i686 mingw32 do have two underscores. From this comment on an old patch, it does seem like this is intended. But I can't find where Clang does this. Maybe @mstorsjo knows more?

This is the global prefix I'm talking about:

char getGlobalPrefix() const {
switch (ManglingMode) {
case MM_None:
case MM_ELF:
case MM_GOFF:
case MM_Mips:
case MM_WinCOFF:
case MM_XCOFF:
return '\0';
case MM_MachO:
case MM_WinCOFFX86:
return '_';
}
llvm_unreachable("invalid mangling mode");
}

Have we not tried creating Mangled objects from mingw32 mangled names prior to your reverted patch? I'd be surprised, but maybe it truly is the first time it was required?

But yea, having an example of the mangled names that we get from debug-info on mingw32 would be helpful

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Sep 26, 2025

Since the only difference in the reverted patch was adding AsmLabels to the PDB decls, I'd be curious to see what those AsmLabels look like on mingw32?

This wasn't reverted in 185ae5c - it was the function naming.

Have we not tried creating Mangled objects from mingw32 mangled names prior to your reverted patch? I'd be surprised, but maybe it truly is the first time it was required?

Surprised me as well, but now that I think about it, it does make sense:
We only use Mangled in the native plugin for function creation, where we (currently) use the demangled name. And before #154121, public symbols from the PDB were not included.
Only the DIA plugin used the mangled name (if available), but that's not used on MinGW builds.

@mstorsjo
Copy link
Member

I'm no real MinGW user, and I couldn't find documentation on the mangling used there, so I relied on examples. The mangled names on i686 mingw32 do have two underscores. From this comment on an old patch, it does seem like this is intended. But I can't find where Clang does this. Maybe @mstorsjo knows more?

I'm not entirely sure where that happens in the stack either, but you're right - there's a global _ prefix on all symbols.

Itanium specifies that mangled names start with _Z (and ___Z for blocks). A second leading underscore is a global symbol prefix added on some platforms (Darwin and possibly mingw?).

Exactly. The extra underscore prefix on i386 isn't mingw specific either, it's on MSVC as well - for regular C symbols. For other calling conventions (like fastcall or vectorcall) the prefix is different though, and for MSVC C++ mangled symbols, there's a different prefix. But Itanium C++ ABI on i386 works through the regular (cdecl) mangling, which adds a _ prefix, just like all regular plain C functions.

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Sep 30, 2025

Exactly. The extra underscore prefix on i386 isn't mingw specific either, it's on MSVC as well - for regular C symbols. For other calling conventions (like fastcall or vectorcall) the prefix is different though, and for MSVC C++ mangled symbols, there's a different prefix. But Itanium C++ ABI on i386 works through the regular (cdecl) mangling, which adds a _ prefix, just like all regular plain C functions.

Ah, thank you for the clarification. Looking at MS' docs, I think we should instead have some preprocessing function in the PDB plugin that strips the C mangling to then pass the potentially mangled name to Mangled. For example, on non-64bit, _CFuncParamStdCall@4 would be stripped to CFuncParamStdCall and then passed to Mangled. Similarly, the leading underscore of __RNvCsj4CZ6flxxfE_7___rustc12___rust_alloc would be removed. Does this sound reasonable? If so, I'd close this PR and implement this when relanding #149701.

@mstorsjo
Copy link
Member

Exactly. The extra underscore prefix on i386 isn't mingw specific either, it's on MSVC as well - for regular C symbols. For other calling conventions (like fastcall or vectorcall) the prefix is different though, and for MSVC C++ mangled symbols, there's a different prefix. But Itanium C++ ABI on i386 works through the regular (cdecl) mangling, which adds a _ prefix, just like all regular plain C functions.

Ah, thank you for the clarification. Looking at MS' docs, I think we should instead have some preprocessing function in the PDB plugin that strips the C mangling to then pass the potentially mangled name to Mangled. For example, on non-64bit, _CFuncParamStdCall@4 would be stripped to CFuncParamStdCall and then passed to Mangled. Similarly, the leading underscore of __RNvCsj4CZ6flxxfE_7___rustc12___rust_alloc would be removed. Does this sound reasonable? If so, I'd close this PR and implement this when relanding #149701.

Hmm. I'm unsure which way it is best to do the layering here.

I think we should be able to look at llvm/lib/Demangle for inspiration as well.

The suggested layering, which demangles cdecl, _cdeclfunc into cdeclfunc and stdcall _CFuncParamStdCall@4 into CFuncParamStdCall before doing other C++ demangling (itanium or MS C++ ABI demangling) doesn't fit entirely right wrt the MS C++ ABI, because those symbols don't have either of the cdecl or stdcall decorations, as the MS C++ ABI mangling is on the same level there (there's no extra underscore prefix on them).

I think it's plausible that llvm/lib/Demangle also just accepts __Z as itanium prefix - which I presume that this PR does (I haven't had time to look at the code yet).

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Sep 30, 2025

The suggested layering, which demangles cdecl, _cdeclfunc into cdeclfunc and stdcall _CFuncParamStdCall@4 into CFuncParamStdCall before doing other C++ demangling (itanium or MS C++ ABI demangling) doesn't fit entirely right wrt the MS C++ ABI, because those symbols don't have either of the cdecl or stdcall decorations, as the MS C++ ABI mangling is on the same level there (there's no extra underscore prefix on them).

Right, anything that starts with a question mark (~> MS C++ name) would be ignored. Much like in LLVM:

if (DL.doNotMangleLeadingQuestionMark() && Name[0] == '?')
Prefix = '\0';

I think it's plausible that llvm/lib/Demangle also just accepts __Z as itanium prefix - which I presume that this PR does (I haven't had time to look at the code yet).

That's the case, and also why just checking for __Z isn't enough. For one, this doesn't play nice with Rust, which also has two underscores on i686. And secondly, C names are displayed incorrectly right now. For example, you'd see _main instead of main and a function like void Zone() would become _Zone and interpreted as an itanium name.

I think we should be able to look at llvm/lib/Demangle for inspiration as well.

Currently, there's nothing that demangles these C decorated names in llvm/lib/Demangle. One thing that's a bit unfortunate is that you'd need to know whether the binary is 64bit or not.

I was wrong, we don't just need to do it for PDB, but also for DWARF (i.e. anything that creates Mangled). For this, we could store flags in Mangled for a Windows target and a 64 bit environment.

@Michael137
Copy link
Member

I think it's plausible that llvm/lib/Demangle also just accepts __Z as itanium prefix - which I presume that this PR does (I haven't had time to look at the code yet).

It actually does (which is why on macOS you can pass __Z symbols to c++filt and it works just fine: #106233).

The problem here is that LLDB tries to distinguish Itanium symbols from non-Itanium ones, and it uses _Z definitively. I don't think there's much harm in adding to that list __Z, basically forwarding the responsibility of stripping the symbol to the demangler (which is what other tools already do anyway). I just wanted to make sure we don't bandage over another bug. But it sounds like this is just a new issue specific to how mangled names get emitted for MSVC.

E.g., on Darwin (another platform where a leading underscore is added), we don't run into this issue. I think that's because we strip the leading underscore before putting it into DWARF. But it sounds like MSVC doesn't do that.

So TL;DR, happy to not do the stripping and adjust the Mangled::GetManglingScheme code.

@Michael137
Copy link
Member

Michael137 commented Sep 30, 2025

That's the case, and also why just checking for __Z isn't enough. For one, this doesn't play nice with Rust, which also has two underscores on i686. And secondly, C names are displayed incorrectly right now. For example, you'd see _main instead of main and a function like void Zone() would become _Zone and interpreted as an itanium name.

I see. So just checking __Z in GetManglingScheme doesn't work because Rust and other mangling schemes suffer from this issue. Then yea, I think we can do the stripping in the PDB plugin.

I don't think we need to in DWARF though? At least I haven't seen the prefix be added into DWARF DW_AT_linkage_names before.

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Sep 30, 2025

At least I haven't seen the prefix be added into DWARF DW_AT_linkage_names before.

It's only on non-64bit Windows. I just checked, for the example from above (compiled for i686-windows-msvc), I get:

0x00000206:   DW_TAG_subprogram
                DW_AT_linkage_name	("_CFuncParamStdCall@4")
                DW_AT_name	("CFuncParamStdCall")
                ...

@Michael137
Copy link
Member

Michael137 commented Sep 30, 2025

At least I haven't seen the prefix be added into DWARF DW_AT_linkage_names before.

It's only on non-64bit Windows. I just checked, for the example from above (compiled for i686-windows-msvc), I get:

0x00000206:   DW_TAG_subprogram
                DW_AT_linkage_name	("_CFuncParamStdCall@4")
                DW_AT_name	("CFuncParamStdCall")
                ...

Could you elaborate the configuration here? Compiled with clang-cl on a Windows host I assume? Is this with -gdwarf? Couldn't get something like that to work on Godbolt

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Sep 30, 2025

Could you elaborate the configuration here? Compiled with clang-cl on a Windows host I assume? Is this with -gdwarf? Couldn't get something like that to work on Godbolt

Sure, https://godbolt.org/z/Gj968q3xs shows the functions. Getting clang(-cl) to build binaries for Windows on compiler explorer is always a bit tricky, so I initially did it locally on Windows through a x86 command prompt.

@Michael137
Copy link
Member

Could you elaborate the configuration here? Compiled with clang-cl on a Windows host I assume? Is this with -gdwarf? Couldn't get something like that to work on Godbolt

Sure, https://godbolt.org/z/Gj968q3xs shows the functions. Getting clang(-cl) to build binaries for Windows on compiler explorer is always a bit tricky, so I initially did it locally on Windows through a x86 command prompt.

Hah, interesting, so anything with a stdcall calling convention also gets this _ prefix:

Out << '\01';
if (CC == CCM_Std)
Out << '_';
else if (CC == CCM_Fast)
Out << '@';
else if (CC == CCM_RegCall) {
if (getASTContext().getLangOpts().RegCall4)
Out << "__regcall4__";
else
Out << "__regcall3__";
}
, as I now see @mstorsjo pointed out above.

Other C and C++ names don't have it in DWARF though (even on Windows), as your Godbolt link demonstrates.

This is quite the can of worms. Basically what ends up in DWARF as linkage names is whatever the Clang AST mangled to. But, iiuc, the global _ prefix isn't yet attached to the mangled name at that point, unless we're using the __stdcall calling convention on supported targets. Arguably, the DW_AT_linkage_name in DWARF should be the linkage name that end up in the object files, but what we actually put there are the platform independent C++ mangled names. I'm still not 100% sure which level we'd want to strip these symbols at. For Mach-O, where we have the _ prefix too, LLDB strips the prefix when parsing the symbol table.

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Oct 1, 2025

Other C and C++ names don't have it in DWARF though (even on Windows), as your Godbolt link demonstrates.

I didn't notice this at first, you're right. With PDB, you get these names:

  431284 | S_PUB32 [size = 28] `_CFuncCCall`
           flags = function, addr = 0001:25168

My guess is that's because it's created by the linker and not the compiler.

I'm still not 100% sure which level we'd want to strip these symbols at. For Mach-O, where we have the _ prefix too, LLDB strips the prefix when parsing the symbol table.

For the time being, we could strip only the __cdecl prefix in PDB to match DWARF and open an issue for the mangling of other calling conventions.


A bit related: It seems like debugging x86 executables on x86_64 Windows (WOW64) doesn't work right now, because STATUS_WX86_BREAKPOINT is received there as opposed to EXCEPTION_BREAKPOINT. Windows also sends two events for the initial breakpoint (EXCEPTION_BREAKPOINT followed by STATUS_WX86_BREAKPOINT).

@mstorsjo
Copy link
Member

mstorsjo commented Oct 1, 2025

For the time being, we could strip only the __cdecl prefix in PDB to match DWARF and open an issue for the mangling of other calling conventions.

Sounds reasonble. If we add good test coverage for these cases, we should be kinda free to adjust the exact implementation later anyway.

A bit related: It seems like debugging x86 executables on x86_64 Windows (WOW64) doesn't work right now, because STATUS_WX86_BREAKPOINT is received there as opposed to EXCEPTION_BREAKPOINT. Windows also sends two events for the initial breakpoint (EXCEPTION_BREAKPOINT followed by STATUS_WX86_BREAKPOINT).

Yes, this matches my experience - the code is meant to work for x86 debugging with an x86_64 debugger, but in practice, it doesn't.

For cases like these, I presume you don't actually need to test live debugging though, I guess it should be enough with just tests that load an executable and the matching debug info and inspect it? And such tests should be able to run on any platform.

@Michael137
Copy link
Member

Other C and C++ names don't have it in DWARF though (even on Windows), as your Godbolt link demonstrates.

I didn't notice this at first, you're right. With PDB, you get these names:


  431284 | S_PUB32 [size = 28] `_CFuncCCall`

           flags = function, addr = 0001:25168

My guess is that's because it's created by the linker and not the compiler.

I'm still not 100% sure which level we'd want to strip these symbols at. For Mach-O, where we have the _ prefix too, LLDB strips the prefix when parsing the symbol table.

For the time being, we could strip only the __cdecl prefix in PDB to match DWARF and open an issue for the mangling of other calling conventions.

Sounds like a good compromise.

Btw, @al45tair pointed out to me that mangled names can appear in multiple places in PDB. One has prefixed mangled names and the other doesnt (i think the one per module doesnt). So you have to make sure we only strip the ones that are actually prefixed

@Nerixyz
Copy link
Contributor Author

Nerixyz commented Oct 2, 2025

Btw, @al45tair pointed out to me that mangled names can appear in multiple places in PDB. One has prefixed mangled names and the other doesnt (i think the one per module doesnt). So you have to make sure we only strip the ones that are actually prefixed

Where else do they appear? My understanding is that mangled function (and variable) names are only present in the publics stream (also mentioned in LLVM's CodeView/PDB docs). The globals stream doesn't contain the mangled names in S_(L)PROCREF and the module symbols which these globals are referencing (S_GPROC32/S_LPROC32) don't contain them either (at least for C++). The names of these references/functions are already demangled.

My approach would be to only transform the symbols from the publics stream that we read when searching for the mangled name in my original PR.

Nerixyz added a commit that referenced this pull request Oct 7, 2025
Relands #149701 which was reverted in
185ae5c
because it broke demangling of Itanium symbols on i386.

The last commit in this PR adds the fix for this (discussed in #160930).
On x86 environments, the prefix of `__cdecl` functions will now be
removed to match DWARF. I opened #161676 to discuss this for the other
calling conventions.
@Nerixyz Nerixyz closed this Oct 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants