Skip to content

Conversation

@dpaoliello
Copy link
Contributor

Arm64EC uses a special name mangling mode that adds $$h between the symbol name and its type. In MSVC's name mangling @ is used to separate the name and type BUT it is also used for other purposes, such as the separator between paths in a fully qualified name.

The original algorithm was quite fragile and made assumptions that didn't hold true for all MSVC mangled symbols, so instead of trying to improve this algorithm we are now using the demangler to indicate where the insertion point should be (i.e., to parse the fully-qualified name and return the current string offset).

Also fixed isArm64ECMangledFunctionName to search for @$$h since the $$h must always be after a @.

Fixes #115231

@dpaoliello dpaoliello added this to the LLVM 19.X Release milestone Nov 14, 2024
@dpaoliello dpaoliello changed the title [llvm][aarch64] Fix Arm64EC name mangling algorithm (#115567) release/19.x: [llvm][aarch64] Fix Arm64EC name mangling algorithm (#115567) Nov 14, 2024
@llvmbot
Copy link
Member

llvmbot commented Nov 14, 2024

@llvm/pr-subscribers-llvm-ir

Author: Daniel Paoliello (dpaoliello)

Changes

Arm64EC uses a special name mangling mode that adds $$h between the symbol name and its type. In MSVC's name mangling @ is used to separate the name and type BUT it is also used for other purposes, such as the separator between paths in a fully qualified name.

The original algorithm was quite fragile and made assumptions that didn't hold true for all MSVC mangled symbols, so instead of trying to improve this algorithm we are now using the demangler to indicate where the insertion point should be (i.e., to parse the fully-qualified name and return the current string offset).

Also fixed isArm64ECMangledFunctionName to search for @$$h since the $$h must always be after a @.

Fixes #115231


Full diff: https://github.com/llvm/llvm-project/pull/116273.diff

6 Files Affected:

  • (modified) llvm/include/llvm/Demangle/Demangle.h (+4)
  • (modified) llvm/include/llvm/Demangle/MicrosoftDemangle.h (+4)
  • (modified) llvm/include/llvm/IR/Mangler.h (+6)
  • (modified) llvm/lib/Demangle/MicrosoftDemangle.cpp (+19)
  • (modified) llvm/lib/IR/Mangler.cpp (+16-20)
  • (modified) llvm/unittests/IR/ManglerTest.cpp (+77)
diff --git a/llvm/include/llvm/Demangle/Demangle.h b/llvm/include/llvm/Demangle/Demangle.h
index fe129603c0785d..132e5088b55148 100644
--- a/llvm/include/llvm/Demangle/Demangle.h
+++ b/llvm/include/llvm/Demangle/Demangle.h
@@ -10,6 +10,7 @@
 #define LLVM_DEMANGLE_DEMANGLE_H
 
 #include <cstddef>
+#include <optional>
 #include <string>
 #include <string_view>
 
@@ -54,6 +55,9 @@ enum MSDemangleFlags {
 char *microsoftDemangle(std::string_view mangled_name, size_t *n_read,
                         int *status, MSDemangleFlags Flags = MSDF_None);
 
+std::optional<size_t>
+getArm64ECInsertionPointInMangledName(std::string_view MangledName);
+
 // Demangles a Rust v0 mangled symbol.
 char *rustDemangle(std::string_view MangledName);
 
diff --git a/llvm/include/llvm/Demangle/MicrosoftDemangle.h b/llvm/include/llvm/Demangle/MicrosoftDemangle.h
index 6891185a28e57f..276efa7603690e 100644
--- a/llvm/include/llvm/Demangle/MicrosoftDemangle.h
+++ b/llvm/include/llvm/Demangle/MicrosoftDemangle.h
@@ -9,6 +9,7 @@
 #ifndef LLVM_DEMANGLE_MICROSOFTDEMANGLE_H
 #define LLVM_DEMANGLE_MICROSOFTDEMANGLE_H
 
+#include "llvm/Demangle/Demangle.h"
 #include "llvm/Demangle/MicrosoftDemangleNodes.h"
 
 #include <cassert>
@@ -141,6 +142,9 @@ enum class FunctionIdentifierCodeGroup { Basic, Under, DoubleUnder };
 // It has a set of functions to parse mangled symbols into Type instances.
 // It also has a set of functions to convert Type instances to strings.
 class Demangler {
+  friend std::optional<size_t>
+  llvm::getArm64ECInsertionPointInMangledName(std::string_view MangledName);
+
 public:
   Demangler() = default;
   virtual ~Demangler() = default;
diff --git a/llvm/include/llvm/IR/Mangler.h b/llvm/include/llvm/IR/Mangler.h
index f28ffc961b6db0..33af40c5ae98dd 100644
--- a/llvm/include/llvm/IR/Mangler.h
+++ b/llvm/include/llvm/IR/Mangler.h
@@ -56,6 +56,12 @@ void emitLinkerFlagsForUsedCOFF(raw_ostream &OS, const GlobalValue *GV,
 std::optional<std::string> getArm64ECMangledFunctionName(StringRef Name);
 std::optional<std::string> getArm64ECDemangledFunctionName(StringRef Name);
 
+/// Check if an ARM64EC function name is mangled.
+bool inline isArm64ECMangledFunctionName(StringRef Name) {
+  return Name[0] == '#' ||
+         (Name[0] == '?' && Name.find("@$$h") != StringRef::npos);
+}
+
 } // End llvm namespace
 
 #endif
diff --git a/llvm/lib/Demangle/MicrosoftDemangle.cpp b/llvm/lib/Demangle/MicrosoftDemangle.cpp
index c5835e8c2e989d..d35902a3337678 100644
--- a/llvm/lib/Demangle/MicrosoftDemangle.cpp
+++ b/llvm/lib/Demangle/MicrosoftDemangle.cpp
@@ -24,6 +24,7 @@
 #include <array>
 #include <cctype>
 #include <cstdio>
+#include <optional>
 #include <string_view>
 #include <tuple>
 
@@ -2424,6 +2425,24 @@ void Demangler::dumpBackReferences() {
     std::printf("\n");
 }
 
+std::optional<size_t>
+llvm::getArm64ECInsertionPointInMangledName(std::string_view MangledName) {
+  std::string_view ProcessedName{MangledName};
+
+  // We only support this for MSVC-style C++ symbols.
+  if (!consumeFront(ProcessedName, '?'))
+    return std::nullopt;
+
+  // The insertion point is just after the name of the symbol, so parse that to
+  // remove it from the processed name.
+  Demangler D;
+  D.demangleFullyQualifiedSymbolName(ProcessedName);
+  if (D.Error)
+    return std::nullopt;
+
+  return MangledName.length() - ProcessedName.length();
+}
+
 char *llvm::microsoftDemangle(std::string_view MangledName, size_t *NMangled,
                               int *Status, MSDemangleFlags Flags) {
   Demangler D;
diff --git a/llvm/lib/IR/Mangler.cpp b/llvm/lib/IR/Mangler.cpp
index e6c3ea9d568831..884739b3212c61 100644
--- a/llvm/lib/IR/Mangler.cpp
+++ b/llvm/lib/IR/Mangler.cpp
@@ -14,6 +14,7 @@
 #include "llvm/ADT/SmallString.h"
 #include "llvm/ADT/StringExtras.h"
 #include "llvm/ADT/Twine.h"
+#include "llvm/Demangle/Demangle.h"
 #include "llvm/IR/DataLayout.h"
 #include "llvm/IR/DerivedTypes.h"
 #include "llvm/IR/Function.h"
@@ -291,30 +292,25 @@ void llvm::emitLinkerFlagsForUsedCOFF(raw_ostream &OS, const GlobalValue *GV,
 }
 
 std::optional<std::string> llvm::getArm64ECMangledFunctionName(StringRef Name) {
-  bool IsCppFn = Name[0] == '?';
-  if (IsCppFn && Name.contains("$$h"))
-    return std::nullopt;
-  if (!IsCppFn && Name[0] == '#')
+  if (Name[0] != '?') {
+    // For non-C++ symbols, prefix the name with "#" unless it's already
+    // mangled.
+    if (Name[0] == '#')
+      return std::nullopt;
+    return std::optional<std::string>(("#" + Name).str());
+  }
+
+  // If the name contains $$h, then it is already mangled.
+  if (Name.contains("$$h"))
     return std::nullopt;
 
-  StringRef Prefix = "$$h";
-  size_t InsertIdx = 0;
-  if (IsCppFn) {
-    InsertIdx = Name.find("@@");
-    size_t ThreeAtSignsIdx = Name.find("@@@");
-    if (InsertIdx != std::string::npos && InsertIdx != ThreeAtSignsIdx) {
-      InsertIdx += 2;
-    } else {
-      InsertIdx = Name.find("@");
-      if (InsertIdx != std::string::npos)
-        InsertIdx++;
-    }
-  } else {
-    Prefix = "#";
-  }
+  // Ask the demangler where we should insert "$$h".
+  auto InsertIdx = getArm64ECInsertionPointInMangledName(Name);
+  if (!InsertIdx)
+    return std::nullopt;
 
   return std::optional<std::string>(
-      (Name.substr(0, InsertIdx) + Prefix + Name.substr(InsertIdx)).str());
+      (Name.substr(0, *InsertIdx) + "$$h" + Name.substr(*InsertIdx)).str());
 }
 
 std::optional<std::string>
diff --git a/llvm/unittests/IR/ManglerTest.cpp b/llvm/unittests/IR/ManglerTest.cpp
index f2b78a1f987694..f8a3152564fd9a 100644
--- a/llvm/unittests/IR/ManglerTest.cpp
+++ b/llvm/unittests/IR/ManglerTest.cpp
@@ -174,4 +174,81 @@ TEST(ManglerTest, GOFF) {
             "L#foo");
 }
 
+TEST(ManglerTest, Arm64EC) {
+  constexpr std::string_view Arm64ECNames[] = {
+      // Basic C name.
+      "#Foo",
+
+      // Basic C++ name.
+      "?foo@@$$hYAHXZ",
+
+      // Regression test: https://github.com/llvm/llvm-project/issues/115231
+      "?GetValue@?$Wrapper@UA@@@@$$hQEBAHXZ",
+
+      // Symbols from:
+      // ```
+      // namespace A::B::C::D {
+      // struct Base {
+      //   virtual int f() { return 0; }
+      // };
+      // }
+      // struct Derived : public A::B::C::D::Base {
+      //   virtual int f() override { return 1; }
+      // };
+      // A::B::C::D::Base* MakeObj() { return new Derived(); }
+      // ```
+      // void * __cdecl operator new(unsigned __int64)
+      "??2@$$hYAPEAX_K@Z",
+      // public: virtual int __cdecl A::B::C::D::Base::f(void)
+      "?f@Base@D@C@B@A@@$$hUEAAHXZ",
+      // public: __cdecl A::B::C::D::Base::Base(void)
+      "??0Base@D@C@B@A@@$$hQEAA@XZ",
+      // public: virtual int __cdecl Derived::f(void)
+      "?f@Derived@@$$hUEAAHXZ",
+      // public: __cdecl Derived::Derived(void)
+      "??0Derived@@$$hQEAA@XZ",
+      // struct A::B::C::D::Base * __cdecl MakeObj(void)
+      "?MakeObj@@$$hYAPEAUBase@D@C@B@A@@XZ",
+
+      // Symbols from:
+      // ```
+      // template <typename T> struct WW { struct Z{}; };
+      // template <typename X> struct Wrapper {
+      //   int GetValue(typename WW<X>::Z) const;
+      // };
+      // struct A { };
+      // template <typename X> int Wrapper<X>::GetValue(typename WW<X>::Z) const
+      // { return 3; }
+      // template class Wrapper<A>;
+      // ```
+      // public: int __cdecl Wrapper<struct A>::GetValue(struct WW<struct
+      // A>::Z)const
+      "?GetValue@?$Wrapper@UA@@@@$$hQEBAHUZ@?$WW@UA@@@@@Z",
+  };
+
+  for (const auto &Arm64ECName : Arm64ECNames) {
+    // Check that this is a mangled name.
+    EXPECT_TRUE(isArm64ECMangledFunctionName(Arm64ECName))
+        << "Test case: " << Arm64ECName;
+    // Refuse to mangle it again.
+    EXPECT_FALSE(getArm64ECMangledFunctionName(Arm64ECName).has_value())
+        << "Test case: " << Arm64ECName;
+
+    // Demangle.
+    auto Arm64Name = getArm64ECDemangledFunctionName(Arm64ECName);
+    EXPECT_TRUE(Arm64Name.has_value()) << "Test case: " << Arm64ECName;
+    // Check that it is not mangled.
+    EXPECT_FALSE(isArm64ECMangledFunctionName(Arm64Name.value()))
+        << "Test case: " << Arm64ECName;
+    // Refuse to demangle it again.
+    EXPECT_FALSE(getArm64ECDemangledFunctionName(Arm64Name.value()).has_value())
+        << "Test case: " << Arm64ECName;
+
+    // Round-trip.
+    auto RoundTripArm64ECName =
+        getArm64ECMangledFunctionName(Arm64Name.value());
+    EXPECT_EQ(RoundTripArm64ECName, Arm64ECName);
+  }
+}
+
 } // end anonymous namespace

Copy link
Collaborator

@efriedma-quic efriedma-quic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

I think this is safe for the branch:

Functionally, this should only affect Arm64EC targets, and this is important for those targets.

The header modifications only add functions; nothing is removed or modified, so there shouldn't be ABI compatibility issues.

@tru tru merged commit 81005af into llvm:release/19.x Nov 15, 2024
1 check was pending
Arm64EC uses a special name mangling mode that adds `$$h` between the
symbol name and its type. In MSVC's name mangling `@` is used to
separate the name and type BUT it is also used for other purposes, such
as the separator between paths in a fully qualified name.

The original algorithm was quite fragile and made assumptions that
didn't hold true for all MSVC mangled symbols, so instead of trying to
improve this algorithm we are now using the demangler to indicate where
the insertion point should be (i.e., to parse the fully-qualified name
and return the current string offset).

Also fixed `isArm64ECMangledFunctionName` to search for `@$$h` since the
`$$h` must always be after a `@`.

Fixes llvm#115231
@github-actions
Copy link

@dpaoliello (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

@dpaoliello dpaoliello deleted the ecname19 branch November 15, 2024 21:13
@dpaoliello
Copy link
Contributor Author

@dpaoliello (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

Fixed an issue where the wrong mangled name was being generated for some C++ functions when targeting Arm64EC

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Development

Successfully merging this pull request may close these issues.

4 participants