Skip to content

[lldb][Expression] Add structor variant to LLDB's function call labels #149827

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Michael137
Copy link
Member

@Michael137 Michael137 commented Jul 21, 2025

Depends on #148877

This patch is an implementation of this discussion about handling ABI-tagged structors during expression evaluation.

Motivation

LLDB encodes the mangled name of a DW_TAG_subprogram into AsmLabels on function and method Clang AST nodes. This means that when calls to these functions get lowered into IR (when running JITted expressions), the address resolver can locate the appropriate symbol by mangled name (and it is guaranteed to find the symbol because we got the mangled name from debug-info, instead of letting Clang mangle it based on AST structure). However, we don't do this for CXXConstructorDecls/CXXDestructorDecls because these structor declarations in DWARF don't have a linkage name. This is because there can be multiple variants of a structor, each with a distinct mangling in the Itanium ABI. Each structor variant has its own definition DW_TAG_subprogram. So LLDB doesn't know which mangled name to put into the AsmLabel.

Currently this means using ABI-tagged structors in LLDB expressions won't work (see this RFC for concrete examples).

Proposed Solution

The FunctionCallLabel encoding that we put into AsmLabels already supports stuffing more info about a DIE into it. So this patch extends the FunctionCallLabel to contain an optional discriminator (a sequence of bytes) which the SymbolFileDWARF plugin interprets as the constructor/destructor variant of that DIE. So when searching for the definition DIE, LLDB will include the structor variant in its heuristic for determining a match.

There's a few subtleties here:

  1. At the point at which LLDB first constructs the label, it has no way of knowing (just by looking at the debug-info declaration), which structor variant the expression evaluator is supposed to call. That's something that gets decided when compiling the expression. So we let the Clang mangler inject the correct structor variant into the AsmLabel during JITing. I adjusted the AsmLabelAttr mangling for this. An option would've been to create a new Clang attribute which behaved like an AsmLabel but with these special semantics for LLDB. My main concern there is that we'd have to adjust all the AsmLabelAttr checks around Clang to also now account for this new attribute.
  2. The compiler is free to omit the C1 variant of a constructor if the C2 variant is sufficient. In that case it may alias C1 to C2, leaving us with only the C2 DW_TAG_subprogram in the object file. Linux is one of the platforms where this occurs. For those cases I added a heuristic in SymbolFileDWARF where we pick C2 if we asked for C1 but it doesn't exist. This may not always be correct (e.g., if the compiler decided to drop C1 for other reasons).

@Michael137 Michael137 requested a review from labath July 21, 2025 14:45
@Michael137 Michael137 requested a review from adrian-prantl July 21, 2025 14:46
@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch 2 times, most recently from aeebf1f to ce7fa42 Compare July 22, 2025 15:08
@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch from ce7fa42 to 2a70642 Compare July 31, 2025 08:52
Copy link

github-actions bot commented Jul 31, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch from 2a70642 to 44410a4 Compare July 31, 2025 13:19
Michael137 added a commit that referenced this pull request Aug 1, 2025
…#148877)

LLDB currently attaches `AsmLabel`s to `FunctionDecl`s such that that
the `IRExecutionUnit` can determine which mangled name to call (we can't
rely on Clang deriving the correct mangled name to call because the
debug-info AST doesn't contain all the info that would be encoded in the
DWARF linkage names). However, we don't attach `AsmLabel`s for structors
because they have multiple variants and thus it's not clear which
mangled name to use. In the [RFC on fixing expression evaluation of
abi-tagged
structors](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
we discussed encoding the structor variant into the `AsmLabel`s.
Specifically in [this
thread](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
we discussed that the contents of the `AsmLabel` are completely under
LLDB's control and we could make use of it to uniquely identify a
function by encoding the exact module and DIE that the function is
associated with (mangled names need not be enough since two identical
mangled symbols may live in different modules). So if we already have a
custom `AsmLabel` format, we can encode the structor variant in a
follow-up (the current idea is to append the structor variant as a
suffix to our custom `AsmLabel` when Clang emits the mangled name into
the JITted IR). Then we would just have to teach the `IRExecutionUnit`
to pick the correct structor variant DIE during symbol resolution. The
draft of this is available
[here](#149827)

This patch sets up the infrastructure for the custom `AsmLabel` format
by encoding the module id, DIE id and mangled name in it.

**Implementation**

The flow is as follows:
1. Create the label in `DWARFASTParserClang`. The format is:
`$__lldb_func:module_id:die_id:mangled_name`
2. When resolving external symbols in `IRExecutionUnit`, we parse this
label and then do a lookup by DIE ID (or mangled name into the module if
the encoded DIE is a declaration).

Depends on #151355
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 1, 2025
…n AsmLabels (#148877)

LLDB currently attaches `AsmLabel`s to `FunctionDecl`s such that that
the `IRExecutionUnit` can determine which mangled name to call (we can't
rely on Clang deriving the correct mangled name to call because the
debug-info AST doesn't contain all the info that would be encoded in the
DWARF linkage names). However, we don't attach `AsmLabel`s for structors
because they have multiple variants and thus it's not clear which
mangled name to use. In the [RFC on fixing expression evaluation of
abi-tagged
structors](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
we discussed encoding the structor variant into the `AsmLabel`s.
Specifically in [this
thread](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
we discussed that the contents of the `AsmLabel` are completely under
LLDB's control and we could make use of it to uniquely identify a
function by encoding the exact module and DIE that the function is
associated with (mangled names need not be enough since two identical
mangled symbols may live in different modules). So if we already have a
custom `AsmLabel` format, we can encode the structor variant in a
follow-up (the current idea is to append the structor variant as a
suffix to our custom `AsmLabel` when Clang emits the mangled name into
the JITted IR). Then we would just have to teach the `IRExecutionUnit`
to pick the correct structor variant DIE during symbol resolution. The
draft of this is available
[here](llvm/llvm-project#149827)

This patch sets up the infrastructure for the custom `AsmLabel` format
by encoding the module id, DIE id and mangled name in it.

**Implementation**

The flow is as follows:
1. Create the label in `DWARFASTParserClang`. The format is:
`$__lldb_func:module_id:die_id:mangled_name`
2. When resolving external symbols in `IRExecutionUnit`, we parse this
label and then do a lookup by DIE ID (or mangled name into the module if
the encoded DIE is a declaration).

Depends on llvm/llvm-project#151355
@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch 3 times, most recently from 499aa52 to 9ec9eef Compare August 1, 2025 21:47
@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch 5 times, most recently from e10a1eb to 82075a2 Compare August 4, 2025 11:30
@Michael137 Michael137 changed the title [DRAFT] [lldb][Expression] Add structor variant to LLDB's function call labels [lldb][Expression] Add structor variant to LLDB's function call labels Aug 4, 2025
@Michael137 Michael137 marked this pull request as ready for review August 4, 2025 11:33
@Michael137 Michael137 requested review from JDevlieghere and a team as code owners August 4, 2025 11:33
@llvmbot llvmbot added clang Clang issues not falling into any other category libc++abi libc++abi C++ Runtime Library. Not libc++. lldb clang:frontend Language frontend issues, e.g. anything involving "Sema" labels Aug 4, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2025

@llvm/pr-subscribers-debuginfo
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-lldb

Author: Michael Buch (Michael137)

Changes

Depends on #148877


Patch is 30.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149827.diff

16 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+4-13)
  • (modified) clang/lib/AST/Mangle.cpp (+21-3)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+5-8)
  • (modified) clang/unittests/AST/DeclTest.cpp (+1-8)
  • (modified) libcxxabi/src/demangle/ItaniumDemangle.h (+2)
  • (modified) lldb/include/lldb/Expression/Expression.h (+6-2)
  • (modified) lldb/source/Expression/Expression.cpp (+17-12)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp (+45-2)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+131-18)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h (+4)
  • (modified) lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp (+3-4)
  • (modified) lldb/unittests/Expression/ExpressionTest.cpp (+22-15)
  • (modified) lldb/unittests/Symbol/TestTypeSystemClang.cpp (+7-7)
  • (modified) llvm/include/llvm/Demangle/Demangle.h (+3)
  • (modified) llvm/include/llvm/Demangle/ItaniumDemangle.h (+2)
  • (modified) llvm/lib/Demangle/ItaniumDemangle.cpp (+7-3)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index b3ff45b3e90a3..a0efb21218312 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1059,22 +1059,13 @@ def AVRSignal : InheritableAttr, TargetSpecificAttr<TargetAVR> {
 def AsmLabel : InheritableAttr {
   let Spellings = [CustomKeyword<"asm">, CustomKeyword<"__asm__">];
   let Args = [
-    // Label specifies the mangled name for the decl.
-    StringArgument<"Label">,
-
-    // IsLiteralLabel specifies whether the label is literal (i.e. suppresses
-    // the global C symbol prefix) or not. If not, the mangle-suppression prefix
-    // ('\01') is omitted from the decl name at the LLVM IR level.
-    //
-    // Non-literal labels are used by some external AST sources like LLDB.
-    BoolArgument<"IsLiteralLabel", /*optional=*/0, /*fake=*/1>
-  ];
+      // Label specifies the mangled name for the decl.
+      StringArgument<"Label">, ];
   let SemaHandler = 0;
   let Documentation = [AsmLabelDocs];
-  let AdditionalMembers =
-[{
+  let AdditionalMembers = [{
 bool isEquivalent(AsmLabelAttr *Other) const {
-  return getLabel() == Other->getLabel() && getIsLiteralLabel() == Other->getIsLiteralLabel();
+  return getLabel() == Other->getLabel();
 }
 }];
 }
diff --git a/clang/lib/AST/Mangle.cpp b/clang/lib/AST/Mangle.cpp
index 9652fdbc4e125..58029476572c5 100644
--- a/clang/lib/AST/Mangle.cpp
+++ b/clang/lib/AST/Mangle.cpp
@@ -152,6 +152,20 @@ bool MangleContext::shouldMangleDeclName(const NamedDecl *D) {
   return shouldMangleCXXName(D);
 }
 
+static llvm::StringRef g_lldb_func_call_label_prefix = "$__lldb_func";
+
+static void emitLLDBAsmLabel(llvm::StringRef label, GlobalDecl GD,
+                             llvm::raw_ostream &Out) {
+  Out << g_lldb_func_call_label_prefix << ":";
+
+  if (llvm::isa<clang::CXXConstructorDecl>(GD.getDecl()))
+    Out << "C" << GD.getCtorType();
+  else if (llvm::isa<clang::CXXDestructorDecl>(GD.getDecl()))
+    Out << "D" << GD.getDtorType();
+
+  Out << label.substr(g_lldb_func_call_label_prefix.size() + 1);
+}
+
 void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
   const ASTContext &ASTContext = getASTContext();
   const NamedDecl *D = cast<NamedDecl>(GD.getDecl());
@@ -161,9 +175,9 @@ void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
   if (const AsmLabelAttr *ALA = D->getAttr<AsmLabelAttr>()) {
     // If we have an asm name, then we use it as the mangling.
 
-    // If the label isn't literal, or if this is an alias for an LLVM intrinsic,
+    // If the label is an alias for an LLVM intrinsic,
     // do not add a "\01" prefix.
-    if (!ALA->getIsLiteralLabel() || ALA->getLabel().starts_with("llvm.")) {
+    if (ALA->getLabel().starts_with("llvm.")) {
       Out << ALA->getLabel();
       return;
     }
@@ -185,7 +199,11 @@ void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
     if (!UserLabelPrefix.empty())
       Out << '\01'; // LLVM IR Marker for __asm("foo")
 
-    Out << ALA->getLabel();
+    if (ALA->getLabel().starts_with(g_lldb_func_call_label_prefix))
+      emitLLDBAsmLabel(ALA->getLabel(), GD, Out);
+    else
+      Out << ALA->getLabel();
+
     return;
   }
 
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index e2ac648320c0f..885d04b9ab926 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -8113,9 +8113,7 @@ NamedDecl *Sema::ActOnVariableDeclarator(
       }
     }
 
-    NewVD->addAttr(AsmLabelAttr::Create(Context, Label,
-                                        /*IsLiteralLabel=*/true,
-                                        SE->getStrTokenLoc(0)));
+    NewVD->addAttr(AsmLabelAttr::Create(Context, Label, SE->getStrTokenLoc(0)));
   } else if (!ExtnameUndeclaredIdentifiers.empty()) {
     llvm::DenseMap<IdentifierInfo*,AsmLabelAttr*>::iterator I =
       ExtnameUndeclaredIdentifiers.find(NewVD->getIdentifier());
@@ -10345,9 +10343,8 @@ Sema::ActOnFunctionDeclarator(Scope *S, Declarator &D, DeclContext *DC,
   if (Expr *E = D.getAsmLabel()) {
     // The parser guarantees this is a string.
     StringLiteral *SE = cast<StringLiteral>(E);
-    NewFD->addAttr(AsmLabelAttr::Create(Context, SE->getString(),
-                                        /*IsLiteralLabel=*/true,
-                                        SE->getStrTokenLoc(0)));
+    NewFD->addAttr(
+        AsmLabelAttr::Create(Context, SE->getString(), SE->getStrTokenLoc(0)));
   } else if (!ExtnameUndeclaredIdentifiers.empty()) {
     llvm::DenseMap<IdentifierInfo*,AsmLabelAttr*>::iterator I =
       ExtnameUndeclaredIdentifiers.find(NewFD->getIdentifier());
@@ -20598,8 +20595,8 @@ void Sema::ActOnPragmaRedefineExtname(IdentifierInfo* Name,
                                          LookupOrdinaryName);
   AttributeCommonInfo Info(AliasName, SourceRange(AliasNameLoc),
                            AttributeCommonInfo::Form::Pragma());
-  AsmLabelAttr *Attr = AsmLabelAttr::CreateImplicit(
-      Context, AliasName->getName(), /*IsLiteralLabel=*/true, Info);
+  AsmLabelAttr *Attr =
+      AsmLabelAttr::CreateImplicit(Context, AliasName->getName(), Info);
 
   // If a declaration that:
   // 1) declares a function or a variable
diff --git a/clang/unittests/AST/DeclTest.cpp b/clang/unittests/AST/DeclTest.cpp
index ed635da683aab..afaf413493299 100644
--- a/clang/unittests/AST/DeclTest.cpp
+++ b/clang/unittests/AST/DeclTest.cpp
@@ -74,7 +74,6 @@ TEST(Decl, AsmLabelAttr) {
   StringRef Code = R"(
     struct S {
       void f() {}
-      void g() {}
     };
   )";
   auto AST =
@@ -87,11 +86,8 @@ TEST(Decl, AsmLabelAttr) {
   const auto *DeclS =
       selectFirst<CXXRecordDecl>("d", match(cxxRecordDecl().bind("d"), Ctx));
   NamedDecl *DeclF = *DeclS->method_begin();
-  NamedDecl *DeclG = *(++DeclS->method_begin());
 
-  // Attach asm labels to the decls: one literal, and one not.
-  DeclF->addAttr(AsmLabelAttr::Create(Ctx, "foo", /*LiteralLabel=*/true));
-  DeclG->addAttr(AsmLabelAttr::Create(Ctx, "goo", /*LiteralLabel=*/false));
+  DeclF->addAttr(AsmLabelAttr::Create(Ctx, "foo"));
 
   // Mangle the decl names.
   std::string MangleF, MangleG;
@@ -99,14 +95,11 @@ TEST(Decl, AsmLabelAttr) {
       ItaniumMangleContext::create(Ctx, Diags));
   {
     llvm::raw_string_ostream OS_F(MangleF);
-    llvm::raw_string_ostream OS_G(MangleG);
     MC->mangleName(DeclF, OS_F);
-    MC->mangleName(DeclG, OS_G);
   }
 
   ASSERT_EQ(MangleF, "\x01"
                      "foo");
-  ASSERT_EQ(MangleG, "goo");
 }
 
 TEST(Decl, MangleDependentSizedArray) {
diff --git a/libcxxabi/src/demangle/ItaniumDemangle.h b/libcxxabi/src/demangle/ItaniumDemangle.h
index 6f27da7b9cadf..7b3983bc89367 100644
--- a/libcxxabi/src/demangle/ItaniumDemangle.h
+++ b/libcxxabi/src/demangle/ItaniumDemangle.h
@@ -1766,6 +1766,8 @@ class CtorDtorName final : public Node {
 
   template<typename Fn> void match(Fn F) const { F(Basename, IsDtor, Variant); }
 
+  int getVariant() const { return Variant; }
+
   void printLeft(OutputBuffer &OB) const override {
     if (IsDtor)
       OB += "~";
diff --git a/lldb/include/lldb/Expression/Expression.h b/lldb/include/lldb/Expression/Expression.h
index 20067f469895b..847226167d584 100644
--- a/lldb/include/lldb/Expression/Expression.h
+++ b/lldb/include/lldb/Expression/Expression.h
@@ -103,11 +103,15 @@ class Expression {
 ///
 /// The format being:
 ///
-///   <prefix>:<module uid>:<symbol uid>:<name>
+///   <prefix>:<discriminator>:<module uid>:<symbol uid>:<name>
 ///
 /// The label string needs to stay valid for the entire lifetime
 /// of this object.
 struct FunctionCallLabel {
+  /// Arbitrary string which language plugins can interpret for their
+  /// own needs.
+  llvm::StringRef discriminator;
+
   /// Unique identifier of the lldb_private::Module
   /// which contains the symbol identified by \c symbol_id.
   lldb::user_id_t module_id;
@@ -133,7 +137,7 @@ struct FunctionCallLabel {
   ///
   /// The representation roundtrips through \c fromString:
   /// \code{.cpp}
-  /// llvm::StringRef encoded = "$__lldb_func:0x0:0x0:_Z3foov";
+  /// llvm::StringRef encoded = "$__lldb_func:blah:0x0:0x0:_Z3foov";
   /// FunctionCallLabel label = *fromString(label);
   ///
   /// assert (label.toString() == encoded);
diff --git a/lldb/source/Expression/Expression.cpp b/lldb/source/Expression/Expression.cpp
index 796851ff15ca3..16ecb1d7deef8 100644
--- a/lldb/source/Expression/Expression.cpp
+++ b/lldb/source/Expression/Expression.cpp
@@ -34,10 +34,10 @@ Expression::Expression(ExecutionContextScope &exe_scope)
 
 llvm::Expected<FunctionCallLabel>
 lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
-  llvm::SmallVector<llvm::StringRef, 4> components;
-  label.split(components, ":", /*MaxSplit=*/3);
+  llvm::SmallVector<llvm::StringRef, 5> components;
+  label.split(components, ":", /*MaxSplit=*/4);
 
-  if (components.size() != 4)
+  if (components.size() != 5)
     return llvm::createStringError("malformed function call label.");
 
   if (components[0] != FunctionCallLabelPrefix)
@@ -45,8 +45,10 @@ lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
         "expected function call label prefix '{0}' but found '{1}' instead.",
         FunctionCallLabelPrefix, components[0]));
 
-  llvm::StringRef module_label = components[1];
-  llvm::StringRef die_label = components[2];
+  llvm::StringRef discriminator = components[1];
+  llvm::StringRef module_label = components[2];
+  llvm::StringRef die_label = components[3];
+  llvm::StringRef lookup_name = components[4];
 
   lldb::user_id_t module_id = 0;
   if (!llvm::to_integer(module_label, module_id))
@@ -58,20 +60,23 @@ lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
     return llvm::createStringError(
         llvm::formatv("failed to parse symbol ID from '{0}'.", die_label));
 
-  return FunctionCallLabel{/*.module_id=*/module_id,
+  return FunctionCallLabel{/*.discriminator=*/discriminator,
+                           /*.module_id=*/module_id,
                            /*.symbol_id=*/die_id,
-                           /*.lookup_name=*/components[3]};
+                           /*.lookup_name=*/lookup_name};
 }
 
 std::string lldb_private::FunctionCallLabel::toString() const {
-  return llvm::formatv("{0}:{1:x}:{2:x}:{3}", FunctionCallLabelPrefix,
-                       module_id, symbol_id, lookup_name)
+  return llvm::formatv("{0}:{1}:{2:x}:{3:x}:{4}", FunctionCallLabelPrefix,
+                       discriminator, module_id, symbol_id, lookup_name)
       .str();
 }
 
 void llvm::format_provider<FunctionCallLabel>::format(
     const FunctionCallLabel &label, raw_ostream &OS, StringRef Style) {
-  OS << llvm::formatv("FunctionCallLabel{ module_id: {0:x}, symbol_id: {1:x}, "
-                      "lookup_name: {2} }",
-                      label.module_id, label.symbol_id, label.lookup_name);
+  OS << llvm::formatv("FunctionCallLabel{ discriminator: {0}, module_id: "
+                      "{1:x}, symbol_id: {2:x}, "
+                      "lookup_name: {3} }",
+                      label.discriminator, label.module_id, label.symbol_id,
+                      label.lookup_name);
 }
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index 781c1c6c5745d..5c674308160e5 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -250,11 +250,52 @@ static unsigned GetCXXMethodCVQuals(const DWARFDIE &subprogram,
   return cv_quals;
 }
 
+static const char *GetMangledOrStructorName(const DWARFDIE &die) {
+  const char *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  if (name)
+    return name;
+
+  name = die.GetName();
+  if (!name)
+    return nullptr;
+
+  DWARFDIE parent = die.GetParent();
+  if (!parent.IsStructUnionOrClass())
+    return nullptr;
+
+  const char *parent_name = parent.GetName();
+  if (!parent_name)
+    return nullptr;
+
+  // Constructor.
+  if (::strcmp(parent_name, name) == 0)
+    return name;
+
+  // Destructor.
+  if (name[0] == '~' && ::strcmp(parent_name, name + 1))
+    return name;
+
+  return nullptr;
+}
+
 static std::string MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
-  char const *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  char const *name = GetMangledOrStructorName(die);
   if (!name)
     return {};
 
+  auto *cu = die.GetCU();
+  if (!cu)
+    return {};
+
+  // FIXME: When resolving function call labels, we check that
+  // that the definition's DW_AT_specification points to the
+  // declaration that we encoded into the label here. But if the
+  // declaration came from a type-unit (and the definition from
+  // .debug_info), that check won't work. So for now, don't use
+  // function call labels for declaration DIEs from type-units.
+  if (cu->IsTypeUnit())
+    return {};
+
   SymbolFileDWARF *dwarf = die.GetDWARF();
   if (!dwarf)
     return {};
@@ -285,7 +326,9 @@ static std::string MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
   if (die_id == LLDB_INVALID_UID)
     return {};
 
-  return FunctionCallLabel{/*module_id=*/module_id,
+  // Note, discriminator is added by Clang during mangling.
+  return FunctionCallLabel{/*discriminator=*/{},
+                           /*module_id=*/module_id,
                            /*symbol_id=*/die_id,
                            /*.lookup_name=*/name}
       .toString();
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 42a66ce75d6d6..911ce5930ca4c 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -7,7 +7,9 @@
 //===----------------------------------------------------------------------===//
 
 #include "SymbolFileDWARF.h"
+#include "clang/Basic/ABI.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/DebugInfo/DWARF/DWARFAddressRange.h"
 #include "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
 #include "llvm/Support/Casting.h"
@@ -78,6 +80,7 @@
 
 #include "llvm/DebugInfo/DWARF/DWARFContext.h"
 #include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
+#include "llvm/Demangle/Demangle.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/FormatVariadic.h"
 
@@ -2482,6 +2485,133 @@ bool SymbolFileDWARF::ResolveFunction(const DWARFDIE &orig_die,
   return false;
 }
 
+static int ClangToItaniumCtorKind(clang::CXXCtorType kind) {
+  switch (kind) {
+  case clang::CXXCtorType::Ctor_Complete:
+    return 1;
+  case clang::CXXCtorType::Ctor_Base:
+    return 2;
+  case clang::CXXCtorType::Ctor_CopyingClosure:
+  case clang::CXXCtorType::Ctor_DefaultClosure:
+  case clang::CXXCtorType::Ctor_Comdat:
+    llvm_unreachable("Unexpected constructor kind.");
+  }
+}
+
+static int ClangToItaniumDtorKind(clang::CXXDtorType kind) {
+  switch (kind) {
+  case clang::CXXDtorType::Dtor_Deleting:
+    return 0;
+  case clang::CXXDtorType::Dtor_Complete:
+    return 1;
+  case clang::CXXDtorType::Dtor_Base:
+    return 2;
+  case clang::CXXDtorType::Dtor_Comdat:
+    llvm_unreachable("Unexpected destructor kind.");
+  }
+}
+
+static std::optional<int>
+GetItaniumCtorDtorVariant(llvm::StringRef discriminator) {
+  const bool is_ctor = discriminator.consume_front("C");
+  if (!is_ctor && !discriminator.consume_front("D"))
+    return std::nullopt;
+
+  uint64_t structor_kind;
+  if (!llvm::to_integer(discriminator, structor_kind))
+    return std::nullopt;
+
+  if (is_ctor) {
+    if (structor_kind > clang::CXXCtorType::Ctor_DefaultClosure)
+      return std::nullopt;
+
+    return ClangToItaniumCtorKind(
+        static_cast<clang::CXXCtorType>(structor_kind));
+  }
+
+  if (structor_kind > clang::CXXDtorType::Dtor_Deleting)
+    return std::nullopt;
+
+  return ClangToItaniumDtorKind(static_cast<clang::CXXDtorType>(structor_kind));
+}
+
+DWARFDIE SymbolFileDWARF::FindFunctionDefinition(const FunctionCallLabel &label,
+                                                 const DWARFDIE &declaration) {
+  DWARFDIE definition;
+  llvm::DenseMap<int, DWARFDIE> structor_variant_to_die;
+
+  // eFunctionNameTypeFull for mangled name lookup.
+  // eFunctionNameTypeMethod is required for structor lookups (since we look
+  // those up by DW_AT_name).
+  Module::LookupInfo info(ConstString(label.lookup_name),
+                          lldb::eFunctionNameTypeFull |
+                              lldb::eFunctionNameTypeMethod,
+                          lldb::eLanguageTypeUnknown);
+
+  m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
+    if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
+      return IterationAction::Continue;
+
+    auto spec = entry.GetAttributeValueAsReferenceDIE(DW_AT_specification);
+    if (!spec)
+      return IterationAction::Continue;
+
+    if (spec != declaration)
+      return IterationAction::Continue;
+
+    // We're not picking a specific structor variant DIE, so we're done here.
+    if (label.discriminator.empty()) {
+      definition = entry;
+      return IterationAction::Stop;
+    }
+
+    const char *mangled =
+        entry.GetMangledName(/*substitute_name_allowed=*/false);
+    if (!mangled)
+      return IterationAction::Continue;
+
+    llvm::ItaniumPartialDemangler D;
+    if (D.partialDemangle(mangled))
+      return IterationAction::Continue;
+
+    auto structor_variant = D.getCtorOrDtorVariant();
+    if (!structor_variant)
+      return IterationAction::Continue;
+
+    auto [_, inserted] = structor_variant_to_die.try_emplace(*structor_variant,
+                                                             std::move(entry));
+    assert(inserted);
+
+    // The compiler may choose to alias the constructor variants
+    // (notably this happens on Linux), so we might not have a definition
+    // DIE for some structor variants. Hence we iterate over all variants
+    // and pick the most appropriate one out of those.
+    return IterationAction::Continue;
+  });
+
+  if (definition)
+    return definition;
+
+  auto label_variant = GetItaniumCtorDtorVariant(label.discriminator);
+  if (!label_variant)
+    return {};
+
+  auto it = structor_variant_to_die.find(*label_variant);
+
+  // Found the exact variant.
+  if (it != structor_variant_to_die.end())
+    return it->getSecond();
+
+  // C1 was aliased to C2
+  if (!label.lookup_name.starts_with("~") && label_variant == 1) {
+    if (auto it = structor_variant_to_die.find(2);
+        it != structor_variant_to_die.end())
+      return it->getSecond();
+  }
+
+  return {};
+}
+
 llvm::Expected<SymbolContext>
 SymbolFileDWARF::ResolveFunctionCallLabel(const FunctionCallLabel &label) {
   std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
@@ -2494,24 +2624,7 @@ SymbolFileDWARF::ResolveFunctionCallLabel(const FunctionCallLabel &label) {
   // Label was created using a declaration DIE. Need to fetch the definition
   // to resolve the function call.
   if (die.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0)) {
-    Module::LookupInfo info(ConstString(label.lookup_name),
-                            lldb::eFunctionNameTypeFull,
-                            lldb::eLanguageTypeUnknown);
-
-    m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
-      if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
-        return IterationAction::Continue;
-
-      // We don't check whether the specification DIE for this function
-      // corresponds to the declaration DIE because the declaration might be in
-      // a type-unit but the definition in the compile-unit (and it's
-      // specifcation would point to the declaration in the compile-unit). We
-      // rely on the mangled name within the module to be enough to find us the
-      // unique definition.
-      die = entry;
-      return IterationAction::Stop;
-    });
-
+    die = FindFunctionDefinition(label, die);
     if (die.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
       return llvm::createStringError(
           llvm::formatv("failed to find definition DIE for {0}", label));
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 3ec538da8cf77..843346619db37 100644
--- a/lldb/s...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2025

@llvm/pr-subscribers-libcxxabi

Author: Michael Buch (Michael137)

Changes

Depends on #148877


Patch is 30.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149827.diff

16 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+4-13)
  • (modified) clang/lib/AST/Mangle.cpp (+21-3)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+5-8)
  • (modified) clang/unittests/AST/DeclTest.cpp (+1-8)
  • (modified) libcxxabi/src/demangle/ItaniumDemangle.h (+2)
  • (modified) lldb/include/lldb/Expression/Expression.h (+6-2)
  • (modified) lldb/source/Expression/Expression.cpp (+17-12)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp (+45-2)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+131-18)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h (+4)
  • (modified) lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp (+3-4)
  • (modified) lldb/unittests/Expression/ExpressionTest.cpp (+22-15)
  • (modified) lldb/unittests/Symbol/TestTypeSystemClang.cpp (+7-7)
  • (modified) llvm/include/llvm/Demangle/Demangle.h (+3)
  • (modified) llvm/include/llvm/Demangle/ItaniumDemangle.h (+2)
  • (modified) llvm/lib/Demangle/ItaniumDemangle.cpp (+7-3)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index b3ff45b3e90a3..a0efb21218312 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1059,22 +1059,13 @@ def AVRSignal : InheritableAttr, TargetSpecificAttr<TargetAVR> {
 def AsmLabel : InheritableAttr {
   let Spellings = [CustomKeyword<"asm">, CustomKeyword<"__asm__">];
   let Args = [
-    // Label specifies the mangled name for the decl.
-    StringArgument<"Label">,
-
-    // IsLiteralLabel specifies whether the label is literal (i.e. suppresses
-    // the global C symbol prefix) or not. If not, the mangle-suppression prefix
-    // ('\01') is omitted from the decl name at the LLVM IR level.
-    //
-    // Non-literal labels are used by some external AST sources like LLDB.
-    BoolArgument<"IsLiteralLabel", /*optional=*/0, /*fake=*/1>
-  ];
+      // Label specifies the mangled name for the decl.
+      StringArgument<"Label">, ];
   let SemaHandler = 0;
   let Documentation = [AsmLabelDocs];
-  let AdditionalMembers =
-[{
+  let AdditionalMembers = [{
 bool isEquivalent(AsmLabelAttr *Other) const {
-  return getLabel() == Other->getLabel() && getIsLiteralLabel() == Other->getIsLiteralLabel();
+  return getLabel() == Other->getLabel();
 }
 }];
 }
diff --git a/clang/lib/AST/Mangle.cpp b/clang/lib/AST/Mangle.cpp
index 9652fdbc4e125..58029476572c5 100644
--- a/clang/lib/AST/Mangle.cpp
+++ b/clang/lib/AST/Mangle.cpp
@@ -152,6 +152,20 @@ bool MangleContext::shouldMangleDeclName(const NamedDecl *D) {
   return shouldMangleCXXName(D);
 }
 
+static llvm::StringRef g_lldb_func_call_label_prefix = "$__lldb_func";
+
+static void emitLLDBAsmLabel(llvm::StringRef label, GlobalDecl GD,
+                             llvm::raw_ostream &Out) {
+  Out << g_lldb_func_call_label_prefix << ":";
+
+  if (llvm::isa<clang::CXXConstructorDecl>(GD.getDecl()))
+    Out << "C" << GD.getCtorType();
+  else if (llvm::isa<clang::CXXDestructorDecl>(GD.getDecl()))
+    Out << "D" << GD.getDtorType();
+
+  Out << label.substr(g_lldb_func_call_label_prefix.size() + 1);
+}
+
 void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
   const ASTContext &ASTContext = getASTContext();
   const NamedDecl *D = cast<NamedDecl>(GD.getDecl());
@@ -161,9 +175,9 @@ void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
   if (const AsmLabelAttr *ALA = D->getAttr<AsmLabelAttr>()) {
     // If we have an asm name, then we use it as the mangling.
 
-    // If the label isn't literal, or if this is an alias for an LLVM intrinsic,
+    // If the label is an alias for an LLVM intrinsic,
     // do not add a "\01" prefix.
-    if (!ALA->getIsLiteralLabel() || ALA->getLabel().starts_with("llvm.")) {
+    if (ALA->getLabel().starts_with("llvm.")) {
       Out << ALA->getLabel();
       return;
     }
@@ -185,7 +199,11 @@ void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
     if (!UserLabelPrefix.empty())
       Out << '\01'; // LLVM IR Marker for __asm("foo")
 
-    Out << ALA->getLabel();
+    if (ALA->getLabel().starts_with(g_lldb_func_call_label_prefix))
+      emitLLDBAsmLabel(ALA->getLabel(), GD, Out);
+    else
+      Out << ALA->getLabel();
+
     return;
   }
 
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index e2ac648320c0f..885d04b9ab926 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -8113,9 +8113,7 @@ NamedDecl *Sema::ActOnVariableDeclarator(
       }
     }
 
-    NewVD->addAttr(AsmLabelAttr::Create(Context, Label,
-                                        /*IsLiteralLabel=*/true,
-                                        SE->getStrTokenLoc(0)));
+    NewVD->addAttr(AsmLabelAttr::Create(Context, Label, SE->getStrTokenLoc(0)));
   } else if (!ExtnameUndeclaredIdentifiers.empty()) {
     llvm::DenseMap<IdentifierInfo*,AsmLabelAttr*>::iterator I =
       ExtnameUndeclaredIdentifiers.find(NewVD->getIdentifier());
@@ -10345,9 +10343,8 @@ Sema::ActOnFunctionDeclarator(Scope *S, Declarator &D, DeclContext *DC,
   if (Expr *E = D.getAsmLabel()) {
     // The parser guarantees this is a string.
     StringLiteral *SE = cast<StringLiteral>(E);
-    NewFD->addAttr(AsmLabelAttr::Create(Context, SE->getString(),
-                                        /*IsLiteralLabel=*/true,
-                                        SE->getStrTokenLoc(0)));
+    NewFD->addAttr(
+        AsmLabelAttr::Create(Context, SE->getString(), SE->getStrTokenLoc(0)));
   } else if (!ExtnameUndeclaredIdentifiers.empty()) {
     llvm::DenseMap<IdentifierInfo*,AsmLabelAttr*>::iterator I =
       ExtnameUndeclaredIdentifiers.find(NewFD->getIdentifier());
@@ -20598,8 +20595,8 @@ void Sema::ActOnPragmaRedefineExtname(IdentifierInfo* Name,
                                          LookupOrdinaryName);
   AttributeCommonInfo Info(AliasName, SourceRange(AliasNameLoc),
                            AttributeCommonInfo::Form::Pragma());
-  AsmLabelAttr *Attr = AsmLabelAttr::CreateImplicit(
-      Context, AliasName->getName(), /*IsLiteralLabel=*/true, Info);
+  AsmLabelAttr *Attr =
+      AsmLabelAttr::CreateImplicit(Context, AliasName->getName(), Info);
 
   // If a declaration that:
   // 1) declares a function or a variable
diff --git a/clang/unittests/AST/DeclTest.cpp b/clang/unittests/AST/DeclTest.cpp
index ed635da683aab..afaf413493299 100644
--- a/clang/unittests/AST/DeclTest.cpp
+++ b/clang/unittests/AST/DeclTest.cpp
@@ -74,7 +74,6 @@ TEST(Decl, AsmLabelAttr) {
   StringRef Code = R"(
     struct S {
       void f() {}
-      void g() {}
     };
   )";
   auto AST =
@@ -87,11 +86,8 @@ TEST(Decl, AsmLabelAttr) {
   const auto *DeclS =
       selectFirst<CXXRecordDecl>("d", match(cxxRecordDecl().bind("d"), Ctx));
   NamedDecl *DeclF = *DeclS->method_begin();
-  NamedDecl *DeclG = *(++DeclS->method_begin());
 
-  // Attach asm labels to the decls: one literal, and one not.
-  DeclF->addAttr(AsmLabelAttr::Create(Ctx, "foo", /*LiteralLabel=*/true));
-  DeclG->addAttr(AsmLabelAttr::Create(Ctx, "goo", /*LiteralLabel=*/false));
+  DeclF->addAttr(AsmLabelAttr::Create(Ctx, "foo"));
 
   // Mangle the decl names.
   std::string MangleF, MangleG;
@@ -99,14 +95,11 @@ TEST(Decl, AsmLabelAttr) {
       ItaniumMangleContext::create(Ctx, Diags));
   {
     llvm::raw_string_ostream OS_F(MangleF);
-    llvm::raw_string_ostream OS_G(MangleG);
     MC->mangleName(DeclF, OS_F);
-    MC->mangleName(DeclG, OS_G);
   }
 
   ASSERT_EQ(MangleF, "\x01"
                      "foo");
-  ASSERT_EQ(MangleG, "goo");
 }
 
 TEST(Decl, MangleDependentSizedArray) {
diff --git a/libcxxabi/src/demangle/ItaniumDemangle.h b/libcxxabi/src/demangle/ItaniumDemangle.h
index 6f27da7b9cadf..7b3983bc89367 100644
--- a/libcxxabi/src/demangle/ItaniumDemangle.h
+++ b/libcxxabi/src/demangle/ItaniumDemangle.h
@@ -1766,6 +1766,8 @@ class CtorDtorName final : public Node {
 
   template<typename Fn> void match(Fn F) const { F(Basename, IsDtor, Variant); }
 
+  int getVariant() const { return Variant; }
+
   void printLeft(OutputBuffer &OB) const override {
     if (IsDtor)
       OB += "~";
diff --git a/lldb/include/lldb/Expression/Expression.h b/lldb/include/lldb/Expression/Expression.h
index 20067f469895b..847226167d584 100644
--- a/lldb/include/lldb/Expression/Expression.h
+++ b/lldb/include/lldb/Expression/Expression.h
@@ -103,11 +103,15 @@ class Expression {
 ///
 /// The format being:
 ///
-///   <prefix>:<module uid>:<symbol uid>:<name>
+///   <prefix>:<discriminator>:<module uid>:<symbol uid>:<name>
 ///
 /// The label string needs to stay valid for the entire lifetime
 /// of this object.
 struct FunctionCallLabel {
+  /// Arbitrary string which language plugins can interpret for their
+  /// own needs.
+  llvm::StringRef discriminator;
+
   /// Unique identifier of the lldb_private::Module
   /// which contains the symbol identified by \c symbol_id.
   lldb::user_id_t module_id;
@@ -133,7 +137,7 @@ struct FunctionCallLabel {
   ///
   /// The representation roundtrips through \c fromString:
   /// \code{.cpp}
-  /// llvm::StringRef encoded = "$__lldb_func:0x0:0x0:_Z3foov";
+  /// llvm::StringRef encoded = "$__lldb_func:blah:0x0:0x0:_Z3foov";
   /// FunctionCallLabel label = *fromString(label);
   ///
   /// assert (label.toString() == encoded);
diff --git a/lldb/source/Expression/Expression.cpp b/lldb/source/Expression/Expression.cpp
index 796851ff15ca3..16ecb1d7deef8 100644
--- a/lldb/source/Expression/Expression.cpp
+++ b/lldb/source/Expression/Expression.cpp
@@ -34,10 +34,10 @@ Expression::Expression(ExecutionContextScope &exe_scope)
 
 llvm::Expected<FunctionCallLabel>
 lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
-  llvm::SmallVector<llvm::StringRef, 4> components;
-  label.split(components, ":", /*MaxSplit=*/3);
+  llvm::SmallVector<llvm::StringRef, 5> components;
+  label.split(components, ":", /*MaxSplit=*/4);
 
-  if (components.size() != 4)
+  if (components.size() != 5)
     return llvm::createStringError("malformed function call label.");
 
   if (components[0] != FunctionCallLabelPrefix)
@@ -45,8 +45,10 @@ lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
         "expected function call label prefix '{0}' but found '{1}' instead.",
         FunctionCallLabelPrefix, components[0]));
 
-  llvm::StringRef module_label = components[1];
-  llvm::StringRef die_label = components[2];
+  llvm::StringRef discriminator = components[1];
+  llvm::StringRef module_label = components[2];
+  llvm::StringRef die_label = components[3];
+  llvm::StringRef lookup_name = components[4];
 
   lldb::user_id_t module_id = 0;
   if (!llvm::to_integer(module_label, module_id))
@@ -58,20 +60,23 @@ lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
     return llvm::createStringError(
         llvm::formatv("failed to parse symbol ID from '{0}'.", die_label));
 
-  return FunctionCallLabel{/*.module_id=*/module_id,
+  return FunctionCallLabel{/*.discriminator=*/discriminator,
+                           /*.module_id=*/module_id,
                            /*.symbol_id=*/die_id,
-                           /*.lookup_name=*/components[3]};
+                           /*.lookup_name=*/lookup_name};
 }
 
 std::string lldb_private::FunctionCallLabel::toString() const {
-  return llvm::formatv("{0}:{1:x}:{2:x}:{3}", FunctionCallLabelPrefix,
-                       module_id, symbol_id, lookup_name)
+  return llvm::formatv("{0}:{1}:{2:x}:{3:x}:{4}", FunctionCallLabelPrefix,
+                       discriminator, module_id, symbol_id, lookup_name)
       .str();
 }
 
 void llvm::format_provider<FunctionCallLabel>::format(
     const FunctionCallLabel &label, raw_ostream &OS, StringRef Style) {
-  OS << llvm::formatv("FunctionCallLabel{ module_id: {0:x}, symbol_id: {1:x}, "
-                      "lookup_name: {2} }",
-                      label.module_id, label.symbol_id, label.lookup_name);
+  OS << llvm::formatv("FunctionCallLabel{ discriminator: {0}, module_id: "
+                      "{1:x}, symbol_id: {2:x}, "
+                      "lookup_name: {3} }",
+                      label.discriminator, label.module_id, label.symbol_id,
+                      label.lookup_name);
 }
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index 781c1c6c5745d..5c674308160e5 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -250,11 +250,52 @@ static unsigned GetCXXMethodCVQuals(const DWARFDIE &subprogram,
   return cv_quals;
 }
 
+static const char *GetMangledOrStructorName(const DWARFDIE &die) {
+  const char *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  if (name)
+    return name;
+
+  name = die.GetName();
+  if (!name)
+    return nullptr;
+
+  DWARFDIE parent = die.GetParent();
+  if (!parent.IsStructUnionOrClass())
+    return nullptr;
+
+  const char *parent_name = parent.GetName();
+  if (!parent_name)
+    return nullptr;
+
+  // Constructor.
+  if (::strcmp(parent_name, name) == 0)
+    return name;
+
+  // Destructor.
+  if (name[0] == '~' && ::strcmp(parent_name, name + 1))
+    return name;
+
+  return nullptr;
+}
+
 static std::string MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
-  char const *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  char const *name = GetMangledOrStructorName(die);
   if (!name)
     return {};
 
+  auto *cu = die.GetCU();
+  if (!cu)
+    return {};
+
+  // FIXME: When resolving function call labels, we check that
+  // that the definition's DW_AT_specification points to the
+  // declaration that we encoded into the label here. But if the
+  // declaration came from a type-unit (and the definition from
+  // .debug_info), that check won't work. So for now, don't use
+  // function call labels for declaration DIEs from type-units.
+  if (cu->IsTypeUnit())
+    return {};
+
   SymbolFileDWARF *dwarf = die.GetDWARF();
   if (!dwarf)
     return {};
@@ -285,7 +326,9 @@ static std::string MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
   if (die_id == LLDB_INVALID_UID)
     return {};
 
-  return FunctionCallLabel{/*module_id=*/module_id,
+  // Note, discriminator is added by Clang during mangling.
+  return FunctionCallLabel{/*discriminator=*/{},
+                           /*module_id=*/module_id,
                            /*symbol_id=*/die_id,
                            /*.lookup_name=*/name}
       .toString();
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 42a66ce75d6d6..911ce5930ca4c 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -7,7 +7,9 @@
 //===----------------------------------------------------------------------===//
 
 #include "SymbolFileDWARF.h"
+#include "clang/Basic/ABI.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/DebugInfo/DWARF/DWARFAddressRange.h"
 #include "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
 #include "llvm/Support/Casting.h"
@@ -78,6 +80,7 @@
 
 #include "llvm/DebugInfo/DWARF/DWARFContext.h"
 #include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
+#include "llvm/Demangle/Demangle.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/FormatVariadic.h"
 
@@ -2482,6 +2485,133 @@ bool SymbolFileDWARF::ResolveFunction(const DWARFDIE &orig_die,
   return false;
 }
 
+static int ClangToItaniumCtorKind(clang::CXXCtorType kind) {
+  switch (kind) {
+  case clang::CXXCtorType::Ctor_Complete:
+    return 1;
+  case clang::CXXCtorType::Ctor_Base:
+    return 2;
+  case clang::CXXCtorType::Ctor_CopyingClosure:
+  case clang::CXXCtorType::Ctor_DefaultClosure:
+  case clang::CXXCtorType::Ctor_Comdat:
+    llvm_unreachable("Unexpected constructor kind.");
+  }
+}
+
+static int ClangToItaniumDtorKind(clang::CXXDtorType kind) {
+  switch (kind) {
+  case clang::CXXDtorType::Dtor_Deleting:
+    return 0;
+  case clang::CXXDtorType::Dtor_Complete:
+    return 1;
+  case clang::CXXDtorType::Dtor_Base:
+    return 2;
+  case clang::CXXDtorType::Dtor_Comdat:
+    llvm_unreachable("Unexpected destructor kind.");
+  }
+}
+
+static std::optional<int>
+GetItaniumCtorDtorVariant(llvm::StringRef discriminator) {
+  const bool is_ctor = discriminator.consume_front("C");
+  if (!is_ctor && !discriminator.consume_front("D"))
+    return std::nullopt;
+
+  uint64_t structor_kind;
+  if (!llvm::to_integer(discriminator, structor_kind))
+    return std::nullopt;
+
+  if (is_ctor) {
+    if (structor_kind > clang::CXXCtorType::Ctor_DefaultClosure)
+      return std::nullopt;
+
+    return ClangToItaniumCtorKind(
+        static_cast<clang::CXXCtorType>(structor_kind));
+  }
+
+  if (structor_kind > clang::CXXDtorType::Dtor_Deleting)
+    return std::nullopt;
+
+  return ClangToItaniumDtorKind(static_cast<clang::CXXDtorType>(structor_kind));
+}
+
+DWARFDIE SymbolFileDWARF::FindFunctionDefinition(const FunctionCallLabel &label,
+                                                 const DWARFDIE &declaration) {
+  DWARFDIE definition;
+  llvm::DenseMap<int, DWARFDIE> structor_variant_to_die;
+
+  // eFunctionNameTypeFull for mangled name lookup.
+  // eFunctionNameTypeMethod is required for structor lookups (since we look
+  // those up by DW_AT_name).
+  Module::LookupInfo info(ConstString(label.lookup_name),
+                          lldb::eFunctionNameTypeFull |
+                              lldb::eFunctionNameTypeMethod,
+                          lldb::eLanguageTypeUnknown);
+
+  m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
+    if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
+      return IterationAction::Continue;
+
+    auto spec = entry.GetAttributeValueAsReferenceDIE(DW_AT_specification);
+    if (!spec)
+      return IterationAction::Continue;
+
+    if (spec != declaration)
+      return IterationAction::Continue;
+
+    // We're not picking a specific structor variant DIE, so we're done here.
+    if (label.discriminator.empty()) {
+      definition = entry;
+      return IterationAction::Stop;
+    }
+
+    const char *mangled =
+        entry.GetMangledName(/*substitute_name_allowed=*/false);
+    if (!mangled)
+      return IterationAction::Continue;
+
+    llvm::ItaniumPartialDemangler D;
+    if (D.partialDemangle(mangled))
+      return IterationAction::Continue;
+
+    auto structor_variant = D.getCtorOrDtorVariant();
+    if (!structor_variant)
+      return IterationAction::Continue;
+
+    auto [_, inserted] = structor_variant_to_die.try_emplace(*structor_variant,
+                                                             std::move(entry));
+    assert(inserted);
+
+    // The compiler may choose to alias the constructor variants
+    // (notably this happens on Linux), so we might not have a definition
+    // DIE for some structor variants. Hence we iterate over all variants
+    // and pick the most appropriate one out of those.
+    return IterationAction::Continue;
+  });
+
+  if (definition)
+    return definition;
+
+  auto label_variant = GetItaniumCtorDtorVariant(label.discriminator);
+  if (!label_variant)
+    return {};
+
+  auto it = structor_variant_to_die.find(*label_variant);
+
+  // Found the exact variant.
+  if (it != structor_variant_to_die.end())
+    return it->getSecond();
+
+  // C1 was aliased to C2
+  if (!label.lookup_name.starts_with("~") && label_variant == 1) {
+    if (auto it = structor_variant_to_die.find(2);
+        it != structor_variant_to_die.end())
+      return it->getSecond();
+  }
+
+  return {};
+}
+
 llvm::Expected<SymbolContext>
 SymbolFileDWARF::ResolveFunctionCallLabel(const FunctionCallLabel &label) {
   std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
@@ -2494,24 +2624,7 @@ SymbolFileDWARF::ResolveFunctionCallLabel(const FunctionCallLabel &label) {
   // Label was created using a declaration DIE. Need to fetch the definition
   // to resolve the function call.
   if (die.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0)) {
-    Module::LookupInfo info(ConstString(label.lookup_name),
-                            lldb::eFunctionNameTypeFull,
-                            lldb::eLanguageTypeUnknown);
-
-    m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
-      if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
-        return IterationAction::Continue;
-
-      // We don't check whether the specification DIE for this function
-      // corresponds to the declaration DIE because the declaration might be in
-      // a type-unit but the definition in the compile-unit (and it's
-      // specifcation would point to the declaration in the compile-unit). We
-      // rely on the mangled name within the module to be enough to find us the
-      // unique definition.
-      die = entry;
-      return IterationAction::Stop;
-    });
-
+    die = FindFunctionDefinition(label, die);
     if (die.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
       return llvm::createStringError(
           llvm::formatv("failed to find definition DIE for {0}", label));
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 3ec538da8cf77..843346619db37 100644
--- a/lldb/s...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2025

@llvm/pr-subscribers-clang

Author: Michael Buch (Michael137)

Changes

Depends on #148877


Patch is 30.28 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149827.diff

16 Files Affected:

  • (modified) clang/include/clang/Basic/Attr.td (+4-13)
  • (modified) clang/lib/AST/Mangle.cpp (+21-3)
  • (modified) clang/lib/Sema/SemaDecl.cpp (+5-8)
  • (modified) clang/unittests/AST/DeclTest.cpp (+1-8)
  • (modified) libcxxabi/src/demangle/ItaniumDemangle.h (+2)
  • (modified) lldb/include/lldb/Expression/Expression.h (+6-2)
  • (modified) lldb/source/Expression/Expression.cpp (+17-12)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp (+45-2)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp (+131-18)
  • (modified) lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h (+4)
  • (modified) lldb/source/Plugins/TypeSystem/Clang/TypeSystemClang.cpp (+3-4)
  • (modified) lldb/unittests/Expression/ExpressionTest.cpp (+22-15)
  • (modified) lldb/unittests/Symbol/TestTypeSystemClang.cpp (+7-7)
  • (modified) llvm/include/llvm/Demangle/Demangle.h (+3)
  • (modified) llvm/include/llvm/Demangle/ItaniumDemangle.h (+2)
  • (modified) llvm/lib/Demangle/ItaniumDemangle.cpp (+7-3)
diff --git a/clang/include/clang/Basic/Attr.td b/clang/include/clang/Basic/Attr.td
index b3ff45b3e90a3..a0efb21218312 100644
--- a/clang/include/clang/Basic/Attr.td
+++ b/clang/include/clang/Basic/Attr.td
@@ -1059,22 +1059,13 @@ def AVRSignal : InheritableAttr, TargetSpecificAttr<TargetAVR> {
 def AsmLabel : InheritableAttr {
   let Spellings = [CustomKeyword<"asm">, CustomKeyword<"__asm__">];
   let Args = [
-    // Label specifies the mangled name for the decl.
-    StringArgument<"Label">,
-
-    // IsLiteralLabel specifies whether the label is literal (i.e. suppresses
-    // the global C symbol prefix) or not. If not, the mangle-suppression prefix
-    // ('\01') is omitted from the decl name at the LLVM IR level.
-    //
-    // Non-literal labels are used by some external AST sources like LLDB.
-    BoolArgument<"IsLiteralLabel", /*optional=*/0, /*fake=*/1>
-  ];
+      // Label specifies the mangled name for the decl.
+      StringArgument<"Label">, ];
   let SemaHandler = 0;
   let Documentation = [AsmLabelDocs];
-  let AdditionalMembers =
-[{
+  let AdditionalMembers = [{
 bool isEquivalent(AsmLabelAttr *Other) const {
-  return getLabel() == Other->getLabel() && getIsLiteralLabel() == Other->getIsLiteralLabel();
+  return getLabel() == Other->getLabel();
 }
 }];
 }
diff --git a/clang/lib/AST/Mangle.cpp b/clang/lib/AST/Mangle.cpp
index 9652fdbc4e125..58029476572c5 100644
--- a/clang/lib/AST/Mangle.cpp
+++ b/clang/lib/AST/Mangle.cpp
@@ -152,6 +152,20 @@ bool MangleContext::shouldMangleDeclName(const NamedDecl *D) {
   return shouldMangleCXXName(D);
 }
 
+static llvm::StringRef g_lldb_func_call_label_prefix = "$__lldb_func";
+
+static void emitLLDBAsmLabel(llvm::StringRef label, GlobalDecl GD,
+                             llvm::raw_ostream &Out) {
+  Out << g_lldb_func_call_label_prefix << ":";
+
+  if (llvm::isa<clang::CXXConstructorDecl>(GD.getDecl()))
+    Out << "C" << GD.getCtorType();
+  else if (llvm::isa<clang::CXXDestructorDecl>(GD.getDecl()))
+    Out << "D" << GD.getDtorType();
+
+  Out << label.substr(g_lldb_func_call_label_prefix.size() + 1);
+}
+
 void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
   const ASTContext &ASTContext = getASTContext();
   const NamedDecl *D = cast<NamedDecl>(GD.getDecl());
@@ -161,9 +175,9 @@ void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
   if (const AsmLabelAttr *ALA = D->getAttr<AsmLabelAttr>()) {
     // If we have an asm name, then we use it as the mangling.
 
-    // If the label isn't literal, or if this is an alias for an LLVM intrinsic,
+    // If the label is an alias for an LLVM intrinsic,
     // do not add a "\01" prefix.
-    if (!ALA->getIsLiteralLabel() || ALA->getLabel().starts_with("llvm.")) {
+    if (ALA->getLabel().starts_with("llvm.")) {
       Out << ALA->getLabel();
       return;
     }
@@ -185,7 +199,11 @@ void MangleContext::mangleName(GlobalDecl GD, raw_ostream &Out) {
     if (!UserLabelPrefix.empty())
       Out << '\01'; // LLVM IR Marker for __asm("foo")
 
-    Out << ALA->getLabel();
+    if (ALA->getLabel().starts_with(g_lldb_func_call_label_prefix))
+      emitLLDBAsmLabel(ALA->getLabel(), GD, Out);
+    else
+      Out << ALA->getLabel();
+
     return;
   }
 
diff --git a/clang/lib/Sema/SemaDecl.cpp b/clang/lib/Sema/SemaDecl.cpp
index e2ac648320c0f..885d04b9ab926 100644
--- a/clang/lib/Sema/SemaDecl.cpp
+++ b/clang/lib/Sema/SemaDecl.cpp
@@ -8113,9 +8113,7 @@ NamedDecl *Sema::ActOnVariableDeclarator(
       }
     }
 
-    NewVD->addAttr(AsmLabelAttr::Create(Context, Label,
-                                        /*IsLiteralLabel=*/true,
-                                        SE->getStrTokenLoc(0)));
+    NewVD->addAttr(AsmLabelAttr::Create(Context, Label, SE->getStrTokenLoc(0)));
   } else if (!ExtnameUndeclaredIdentifiers.empty()) {
     llvm::DenseMap<IdentifierInfo*,AsmLabelAttr*>::iterator I =
       ExtnameUndeclaredIdentifiers.find(NewVD->getIdentifier());
@@ -10345,9 +10343,8 @@ Sema::ActOnFunctionDeclarator(Scope *S, Declarator &D, DeclContext *DC,
   if (Expr *E = D.getAsmLabel()) {
     // The parser guarantees this is a string.
     StringLiteral *SE = cast<StringLiteral>(E);
-    NewFD->addAttr(AsmLabelAttr::Create(Context, SE->getString(),
-                                        /*IsLiteralLabel=*/true,
-                                        SE->getStrTokenLoc(0)));
+    NewFD->addAttr(
+        AsmLabelAttr::Create(Context, SE->getString(), SE->getStrTokenLoc(0)));
   } else if (!ExtnameUndeclaredIdentifiers.empty()) {
     llvm::DenseMap<IdentifierInfo*,AsmLabelAttr*>::iterator I =
       ExtnameUndeclaredIdentifiers.find(NewFD->getIdentifier());
@@ -20598,8 +20595,8 @@ void Sema::ActOnPragmaRedefineExtname(IdentifierInfo* Name,
                                          LookupOrdinaryName);
   AttributeCommonInfo Info(AliasName, SourceRange(AliasNameLoc),
                            AttributeCommonInfo::Form::Pragma());
-  AsmLabelAttr *Attr = AsmLabelAttr::CreateImplicit(
-      Context, AliasName->getName(), /*IsLiteralLabel=*/true, Info);
+  AsmLabelAttr *Attr =
+      AsmLabelAttr::CreateImplicit(Context, AliasName->getName(), Info);
 
   // If a declaration that:
   // 1) declares a function or a variable
diff --git a/clang/unittests/AST/DeclTest.cpp b/clang/unittests/AST/DeclTest.cpp
index ed635da683aab..afaf413493299 100644
--- a/clang/unittests/AST/DeclTest.cpp
+++ b/clang/unittests/AST/DeclTest.cpp
@@ -74,7 +74,6 @@ TEST(Decl, AsmLabelAttr) {
   StringRef Code = R"(
     struct S {
       void f() {}
-      void g() {}
     };
   )";
   auto AST =
@@ -87,11 +86,8 @@ TEST(Decl, AsmLabelAttr) {
   const auto *DeclS =
       selectFirst<CXXRecordDecl>("d", match(cxxRecordDecl().bind("d"), Ctx));
   NamedDecl *DeclF = *DeclS->method_begin();
-  NamedDecl *DeclG = *(++DeclS->method_begin());
 
-  // Attach asm labels to the decls: one literal, and one not.
-  DeclF->addAttr(AsmLabelAttr::Create(Ctx, "foo", /*LiteralLabel=*/true));
-  DeclG->addAttr(AsmLabelAttr::Create(Ctx, "goo", /*LiteralLabel=*/false));
+  DeclF->addAttr(AsmLabelAttr::Create(Ctx, "foo"));
 
   // Mangle the decl names.
   std::string MangleF, MangleG;
@@ -99,14 +95,11 @@ TEST(Decl, AsmLabelAttr) {
       ItaniumMangleContext::create(Ctx, Diags));
   {
     llvm::raw_string_ostream OS_F(MangleF);
-    llvm::raw_string_ostream OS_G(MangleG);
     MC->mangleName(DeclF, OS_F);
-    MC->mangleName(DeclG, OS_G);
   }
 
   ASSERT_EQ(MangleF, "\x01"
                      "foo");
-  ASSERT_EQ(MangleG, "goo");
 }
 
 TEST(Decl, MangleDependentSizedArray) {
diff --git a/libcxxabi/src/demangle/ItaniumDemangle.h b/libcxxabi/src/demangle/ItaniumDemangle.h
index 6f27da7b9cadf..7b3983bc89367 100644
--- a/libcxxabi/src/demangle/ItaniumDemangle.h
+++ b/libcxxabi/src/demangle/ItaniumDemangle.h
@@ -1766,6 +1766,8 @@ class CtorDtorName final : public Node {
 
   template<typename Fn> void match(Fn F) const { F(Basename, IsDtor, Variant); }
 
+  int getVariant() const { return Variant; }
+
   void printLeft(OutputBuffer &OB) const override {
     if (IsDtor)
       OB += "~";
diff --git a/lldb/include/lldb/Expression/Expression.h b/lldb/include/lldb/Expression/Expression.h
index 20067f469895b..847226167d584 100644
--- a/lldb/include/lldb/Expression/Expression.h
+++ b/lldb/include/lldb/Expression/Expression.h
@@ -103,11 +103,15 @@ class Expression {
 ///
 /// The format being:
 ///
-///   <prefix>:<module uid>:<symbol uid>:<name>
+///   <prefix>:<discriminator>:<module uid>:<symbol uid>:<name>
 ///
 /// The label string needs to stay valid for the entire lifetime
 /// of this object.
 struct FunctionCallLabel {
+  /// Arbitrary string which language plugins can interpret for their
+  /// own needs.
+  llvm::StringRef discriminator;
+
   /// Unique identifier of the lldb_private::Module
   /// which contains the symbol identified by \c symbol_id.
   lldb::user_id_t module_id;
@@ -133,7 +137,7 @@ struct FunctionCallLabel {
   ///
   /// The representation roundtrips through \c fromString:
   /// \code{.cpp}
-  /// llvm::StringRef encoded = "$__lldb_func:0x0:0x0:_Z3foov";
+  /// llvm::StringRef encoded = "$__lldb_func:blah:0x0:0x0:_Z3foov";
   /// FunctionCallLabel label = *fromString(label);
   ///
   /// assert (label.toString() == encoded);
diff --git a/lldb/source/Expression/Expression.cpp b/lldb/source/Expression/Expression.cpp
index 796851ff15ca3..16ecb1d7deef8 100644
--- a/lldb/source/Expression/Expression.cpp
+++ b/lldb/source/Expression/Expression.cpp
@@ -34,10 +34,10 @@ Expression::Expression(ExecutionContextScope &exe_scope)
 
 llvm::Expected<FunctionCallLabel>
 lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
-  llvm::SmallVector<llvm::StringRef, 4> components;
-  label.split(components, ":", /*MaxSplit=*/3);
+  llvm::SmallVector<llvm::StringRef, 5> components;
+  label.split(components, ":", /*MaxSplit=*/4);
 
-  if (components.size() != 4)
+  if (components.size() != 5)
     return llvm::createStringError("malformed function call label.");
 
   if (components[0] != FunctionCallLabelPrefix)
@@ -45,8 +45,10 @@ lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
         "expected function call label prefix '{0}' but found '{1}' instead.",
         FunctionCallLabelPrefix, components[0]));
 
-  llvm::StringRef module_label = components[1];
-  llvm::StringRef die_label = components[2];
+  llvm::StringRef discriminator = components[1];
+  llvm::StringRef module_label = components[2];
+  llvm::StringRef die_label = components[3];
+  llvm::StringRef lookup_name = components[4];
 
   lldb::user_id_t module_id = 0;
   if (!llvm::to_integer(module_label, module_id))
@@ -58,20 +60,23 @@ lldb_private::FunctionCallLabel::fromString(llvm::StringRef label) {
     return llvm::createStringError(
         llvm::formatv("failed to parse symbol ID from '{0}'.", die_label));
 
-  return FunctionCallLabel{/*.module_id=*/module_id,
+  return FunctionCallLabel{/*.discriminator=*/discriminator,
+                           /*.module_id=*/module_id,
                            /*.symbol_id=*/die_id,
-                           /*.lookup_name=*/components[3]};
+                           /*.lookup_name=*/lookup_name};
 }
 
 std::string lldb_private::FunctionCallLabel::toString() const {
-  return llvm::formatv("{0}:{1:x}:{2:x}:{3}", FunctionCallLabelPrefix,
-                       module_id, symbol_id, lookup_name)
+  return llvm::formatv("{0}:{1}:{2:x}:{3:x}:{4}", FunctionCallLabelPrefix,
+                       discriminator, module_id, symbol_id, lookup_name)
       .str();
 }
 
 void llvm::format_provider<FunctionCallLabel>::format(
     const FunctionCallLabel &label, raw_ostream &OS, StringRef Style) {
-  OS << llvm::formatv("FunctionCallLabel{ module_id: {0:x}, symbol_id: {1:x}, "
-                      "lookup_name: {2} }",
-                      label.module_id, label.symbol_id, label.lookup_name);
+  OS << llvm::formatv("FunctionCallLabel{ discriminator: {0}, module_id: "
+                      "{1:x}, symbol_id: {2:x}, "
+                      "lookup_name: {3} }",
+                      label.discriminator, label.module_id, label.symbol_id,
+                      label.lookup_name);
 }
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
index 781c1c6c5745d..5c674308160e5 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFASTParserClang.cpp
@@ -250,11 +250,52 @@ static unsigned GetCXXMethodCVQuals(const DWARFDIE &subprogram,
   return cv_quals;
 }
 
+static const char *GetMangledOrStructorName(const DWARFDIE &die) {
+  const char *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  if (name)
+    return name;
+
+  name = die.GetName();
+  if (!name)
+    return nullptr;
+
+  DWARFDIE parent = die.GetParent();
+  if (!parent.IsStructUnionOrClass())
+    return nullptr;
+
+  const char *parent_name = parent.GetName();
+  if (!parent_name)
+    return nullptr;
+
+  // Constructor.
+  if (::strcmp(parent_name, name) == 0)
+    return name;
+
+  // Destructor.
+  if (name[0] == '~' && ::strcmp(parent_name, name + 1))
+    return name;
+
+  return nullptr;
+}
+
 static std::string MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
-  char const *name = die.GetMangledName(/*substitute_name_allowed*/ false);
+  char const *name = GetMangledOrStructorName(die);
   if (!name)
     return {};
 
+  auto *cu = die.GetCU();
+  if (!cu)
+    return {};
+
+  // FIXME: When resolving function call labels, we check that
+  // that the definition's DW_AT_specification points to the
+  // declaration that we encoded into the label here. But if the
+  // declaration came from a type-unit (and the definition from
+  // .debug_info), that check won't work. So for now, don't use
+  // function call labels for declaration DIEs from type-units.
+  if (cu->IsTypeUnit())
+    return {};
+
   SymbolFileDWARF *dwarf = die.GetDWARF();
   if (!dwarf)
     return {};
@@ -285,7 +326,9 @@ static std::string MakeLLDBFuncAsmLabel(const DWARFDIE &die) {
   if (die_id == LLDB_INVALID_UID)
     return {};
 
-  return FunctionCallLabel{/*module_id=*/module_id,
+  // Note, discriminator is added by Clang during mangling.
+  return FunctionCallLabel{/*discriminator=*/{},
+                           /*module_id=*/module_id,
                            /*symbol_id=*/die_id,
                            /*.lookup_name=*/name}
       .toString();
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
index 42a66ce75d6d6..911ce5930ca4c 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
@@ -7,7 +7,9 @@
 //===----------------------------------------------------------------------===//
 
 #include "SymbolFileDWARF.h"
+#include "clang/Basic/ABI.h"
 #include "llvm/ADT/STLExtras.h"
+#include "llvm/ADT/StringExtras.h"
 #include "llvm/DebugInfo/DWARF/DWARFAddressRange.h"
 #include "llvm/DebugInfo/DWARF/DWARFDebugLoc.h"
 #include "llvm/Support/Casting.h"
@@ -78,6 +80,7 @@
 
 #include "llvm/DebugInfo/DWARF/DWARFContext.h"
 #include "llvm/DebugInfo/DWARF/DWARFDebugAbbrev.h"
+#include "llvm/Demangle/Demangle.h"
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/FormatVariadic.h"
 
@@ -2482,6 +2485,133 @@ bool SymbolFileDWARF::ResolveFunction(const DWARFDIE &orig_die,
   return false;
 }
 
+static int ClangToItaniumCtorKind(clang::CXXCtorType kind) {
+  switch (kind) {
+  case clang::CXXCtorType::Ctor_Complete:
+    return 1;
+  case clang::CXXCtorType::Ctor_Base:
+    return 2;
+  case clang::CXXCtorType::Ctor_CopyingClosure:
+  case clang::CXXCtorType::Ctor_DefaultClosure:
+  case clang::CXXCtorType::Ctor_Comdat:
+    llvm_unreachable("Unexpected constructor kind.");
+  }
+}
+
+static int ClangToItaniumDtorKind(clang::CXXDtorType kind) {
+  switch (kind) {
+  case clang::CXXDtorType::Dtor_Deleting:
+    return 0;
+  case clang::CXXDtorType::Dtor_Complete:
+    return 1;
+  case clang::CXXDtorType::Dtor_Base:
+    return 2;
+  case clang::CXXDtorType::Dtor_Comdat:
+    llvm_unreachable("Unexpected destructor kind.");
+  }
+}
+
+static std::optional<int>
+GetItaniumCtorDtorVariant(llvm::StringRef discriminator) {
+  const bool is_ctor = discriminator.consume_front("C");
+  if (!is_ctor && !discriminator.consume_front("D"))
+    return std::nullopt;
+
+  uint64_t structor_kind;
+  if (!llvm::to_integer(discriminator, structor_kind))
+    return std::nullopt;
+
+  if (is_ctor) {
+    if (structor_kind > clang::CXXCtorType::Ctor_DefaultClosure)
+      return std::nullopt;
+
+    return ClangToItaniumCtorKind(
+        static_cast<clang::CXXCtorType>(structor_kind));
+  }
+
+  if (structor_kind > clang::CXXDtorType::Dtor_Deleting)
+    return std::nullopt;
+
+  return ClangToItaniumDtorKind(static_cast<clang::CXXDtorType>(structor_kind));
+}
+
+DWARFDIE SymbolFileDWARF::FindFunctionDefinition(const FunctionCallLabel &label,
+                                                 const DWARFDIE &declaration) {
+  DWARFDIE definition;
+  llvm::DenseMap<int, DWARFDIE> structor_variant_to_die;
+
+  // eFunctionNameTypeFull for mangled name lookup.
+  // eFunctionNameTypeMethod is required for structor lookups (since we look
+  // those up by DW_AT_name).
+  Module::LookupInfo info(ConstString(label.lookup_name),
+                          lldb::eFunctionNameTypeFull |
+                              lldb::eFunctionNameTypeMethod,
+                          lldb::eLanguageTypeUnknown);
+
+  m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
+    if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
+      return IterationAction::Continue;
+
+    auto spec = entry.GetAttributeValueAsReferenceDIE(DW_AT_specification);
+    if (!spec)
+      return IterationAction::Continue;
+
+    if (spec != declaration)
+      return IterationAction::Continue;
+
+    // We're not picking a specific structor variant DIE, so we're done here.
+    if (label.discriminator.empty()) {
+      definition = entry;
+      return IterationAction::Stop;
+    }
+
+    const char *mangled =
+        entry.GetMangledName(/*substitute_name_allowed=*/false);
+    if (!mangled)
+      return IterationAction::Continue;
+
+    llvm::ItaniumPartialDemangler D;
+    if (D.partialDemangle(mangled))
+      return IterationAction::Continue;
+
+    auto structor_variant = D.getCtorOrDtorVariant();
+    if (!structor_variant)
+      return IterationAction::Continue;
+
+    auto [_, inserted] = structor_variant_to_die.try_emplace(*structor_variant,
+                                                             std::move(entry));
+    assert(inserted);
+
+    // The compiler may choose to alias the constructor variants
+    // (notably this happens on Linux), so we might not have a definition
+    // DIE for some structor variants. Hence we iterate over all variants
+    // and pick the most appropriate one out of those.
+    return IterationAction::Continue;
+  });
+
+  if (definition)
+    return definition;
+
+  auto label_variant = GetItaniumCtorDtorVariant(label.discriminator);
+  if (!label_variant)
+    return {};
+
+  auto it = structor_variant_to_die.find(*label_variant);
+
+  // Found the exact variant.
+  if (it != structor_variant_to_die.end())
+    return it->getSecond();
+
+  // C1 was aliased to C2
+  if (!label.lookup_name.starts_with("~") && label_variant == 1) {
+    if (auto it = structor_variant_to_die.find(2);
+        it != structor_variant_to_die.end())
+      return it->getSecond();
+  }
+
+  return {};
+}
+
 llvm::Expected<SymbolContext>
 SymbolFileDWARF::ResolveFunctionCallLabel(const FunctionCallLabel &label) {
   std::lock_guard<std::recursive_mutex> guard(GetModuleMutex());
@@ -2494,24 +2624,7 @@ SymbolFileDWARF::ResolveFunctionCallLabel(const FunctionCallLabel &label) {
   // Label was created using a declaration DIE. Need to fetch the definition
   // to resolve the function call.
   if (die.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0)) {
-    Module::LookupInfo info(ConstString(label.lookup_name),
-                            lldb::eFunctionNameTypeFull,
-                            lldb::eLanguageTypeUnknown);
-
-    m_index->GetFunctions(info, *this, {}, [&](DWARFDIE entry) {
-      if (entry.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
-        return IterationAction::Continue;
-
-      // We don't check whether the specification DIE for this function
-      // corresponds to the declaration DIE because the declaration might be in
-      // a type-unit but the definition in the compile-unit (and it's
-      // specifcation would point to the declaration in the compile-unit). We
-      // rely on the mangled name within the module to be enough to find us the
-      // unique definition.
-      die = entry;
-      return IterationAction::Stop;
-    });
-
+    die = FindFunctionDefinition(label, die);
     if (die.GetAttributeValueAsUnsigned(llvm::dwarf::DW_AT_declaration, 0))
       return llvm::createStringError(
           llvm::formatv("failed to find definition DIE for {0}", label));
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h b/lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.h
index 3ec538da8cf77..843346619db37 100644
--- a/lldb/s...
[truncated]

@AaronBallman
Copy link
Collaborator

Clang bits LGTM

@AaronBallman Sorry for the back and forth but I had some of the Clang changes not split into separate commits. Not sure if you looked at the entire diff or just the Clang commits, but I split out one more commit just now just in case

I looked just at the Clang bits, the rest was more "cursory look" than actual review because I don't have the expertise for my sign-off to mean much.

@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch from ab40aca to a35bb5a Compare August 5, 2025 09:54
Comment on lines 290 to 297
// FIXME: When resolving function call labels, we check that
// that the definition's DW_AT_specification points to the
// declaration that we encoded into the label here. But if the
// declaration came from a type-unit (and the definition from
// .debug_info), that check won't work. So for now, don't use
// function call labels for declaration DIEs from type-units.
if (cu->IsTypeUnit())
return {};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how this is related to type units. Even without them, I don't see how we can guarantee we always wind up at the same declaration DIE. A definition of the type (along with declarations of its methods) can be present in ~every CU, but the definition of a specific method will (normally) only be present in a single CU (however, it doesn't always have to be the same CU as c++ allows you to spread the definition of a class across multiple CUs). That definition DIE will point to a declaration, but that doesn't have to be the same declaration that we started with.

If anything, type units should make finding the canonical declaration easier because the linker will deduplicate all of the type definitions (similar to how dsymutil would do it), but even that's not guaranteed because not all CUs have to have type units enabled.

Why do we need to go back the the original DIE?

I don't quite understand how, but I think lldb deals with this problem (multiple definitions of a using the DWARFASTParserClang::CopyUniqueClassMethodTypes function. Is it possible this patch breaks that somehow?

if (!spec)
return IterationAction::Continue;

if (spec != declaration)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I guess this is why. I don't think this will work, for the reasons outlined in the other comment. This needs to be a structural match between the two DIEs, maybe facilitated by DWARFASTParserClang::CopyUniqueClassMethodTypes.

I admit this is more complicated that I originally expected. I didn't realise just how load-bearing the mangling is for constructors. As clang does not emit DW_AT_linkage_name for constructors (for better or worse, gcc does, using the funny C4 variant), the way we were looking up the right constructor versions was by making sure the mangler roundtrips the exact same mangled name.

I still think this is better than relying on the mangling roundtrip, though it brings up an interesting point. Given then ABI tags are meant to allow different versions of the same class to coexist, then if you want to support these different versions coexisting, then the DWARF sort of has to contain the ABI tag information as it relies on structural matches to connect DIEs from different compilation units. Maybe this is irrelevant as you don't want to support this "coexistence" (I believe you just want to "deal with" the presence of ABI tags), but I think this might make a reasonable case for encoding the ABI tag information (in one way or another) in DWARF.

Copy link
Member Author

@Michael137 Michael137 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I guess this is why. I don't think this will work, for the reasons outlined in the other comment. This needs to be a structural match between the two DIEs, maybe facilitated by DWARFASTParserClang::CopyUniqueClassMethodTypes.

That's a good point. This mainly cropped up in the test-suite for type-units, but yea this need not be related to them. I guess this is similar to the other case you pointed out where the declaration DIE might be in a different module than the definition. For those cases we ended up falling back on the old lookup. So the type-unit check is actually redundant because we can just rely on the fallback lookup. But if we want to support a wider range we'd have to change this specification check. Happy to try and change it to a structural match though.

Given then ABI tags are meant to allow different versions of the same class to coexist, then if you want to support these different versions coexisting, then the DWARF sort of has to contain the ABI tag information as it relies on structural matches to connect DIEs from different compilation units. Maybe this is irrelevant as you don't want to support this "coexistence" (I believe you just want to "deal with" the presence of ABI tags), but I think this might make a reasonable case for encoding the ABI tag information (in one way or another) in DWARF.

Hmm interesting point. Trying out the case where we have two ABI-tagged structures in two TUs, it looks like Clang/ld64 decide to emit two structure types with the same name only if there is a structural mismatch:

// lib.cpp
#include <cassert>

struct [[gnu::abi_tag("Lib")]] A {
  A() {}

  int mem = 5;
} a;

void func() {
  assert (a.mem == 5);
}

// main.cpp
#include <cassert>

struct [[gnu::abi_tag("Main")]] A {
  A() {}

  int mem = 10;
} a;

void func();

int main(int argc, char **argv) {
  func();
  assert (a.mem == 10);
  return 0;
}

Here a in lib.o would point to the DW_TAG_structure_type of main.o, but the subprogram definitions for the constructors are different (and local to each CU).

If we change the structure layout though we get a DW_TAG_structure_type in each CU. But if they only differ by which methods they carry we only emit 1 definition!

I guess for now the goal of the expression evaluator is to just support two versions of the same function (with different tags). Which is the easier version of this problem you outline. I don't think we had reservations about encoding ABI-tag information into DWARF. The concerns were:

  1. debug-info size (libc++ uses these tags on a ton of APIs).
  2. whether we want to rely on AST fidelity again
  3. libc++ makes no guarantee that they'll continue annotating individual APIs in the future and might put the tags on namespaces/structures. At which point LLDB will need to take care to propagate the tags appropriately (maybe this will be handled by Clang out-of-the-box, though I haven't thought about this specific point in detail)

Do you prefer us looking into alternatives to the approach in this PR? E.g., make DWARF do some of the heavy lifting? We could put e.g., the structor variant info into DWARF, alleviating the need for demangling, etc.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is actually redundant because we can just rely on the fallback lookup

Is it? I can see how the fallback lookup can work for functions where you have mangled names, as those are more-or-less unique. However, I don't see how we could find the correct constructor if all we have is the name of the class.

Happy to try and change it to a structural match though.

I think that is necessary for this to work. Otherwise, this will fail whenever we parse a type from a CU which does not contain the definition of the methods (at least constructors) of that type. And I think that can happen pretty easily in a big binary.

Here a in lib.o would point to the DW_TAG_structure_type of main.o, but the subprogram definitions for the constructors are different (and local to each CU).

Hm... That's confusing. I was under the impression that ld64 doesn't touch debug info (that it just leaves breadcrumbs so that the debug info can be linked later. Did that change? If not, how can lib.o point to a different file. Surely clang does not inspect main.o when compiling lib.o.

Or did you mean that this happens after running dsymutil over the binary? Because this matches what I would expect dsymutil to do.

I don't think we had reservations about encoding ABI-tag information into DWARF. The concerns were:

1. debug-info size (libc++ uses these tags on a ton of APIs).

Yes, I can see how that could be an issue.

2. whether we want to rely on AST fidelity again

Yeah, I don't want to rely on being able to roundtrip clang ASTs. The thing which changed the calculus for me is that the realisation that this is necessary even to just understand the DWARF correctly. Like, with ABI tags, I can safely link a binary which contains two versions of struct Foo (as long as they are not used in the same CU). However, if I now have a third CU, which defines a constructor for Foo, I have no way to tell which of the two struct definitions it refers to.

3. libc++ makes no guarantee that they'll continue annotating individual APIs in the future and might put the tags on namespaces/structures. At which point LLDB will need to take care to propagate the tags appropriately (maybe this will be handled by Clang out-of-the-box, though I haven't thought about this specific point in detail)

I'm not particularly concerned by that, for one because I don't (primarily) want to feed these tags into clang. My main worry is not being able to differentiate these classes in DWARF.

Do you prefer us looking into alternatives to the approach in this PR? E.g., make DWARF do some of the heavy lifting? We could put e.g., the structor variant info into DWARF, alleviating the need for demangling, etc.

No, I still think this is the best approach for this. I'm just disappointed that it's turning out to be more complicated than I expected. I view the introduction of abi tags as an orthogonal issue -- if it gets included in DWARF, we can include it in the structural match.

I would love if DWARF contained more information on structor variants. I find it disappointing that the only way to locate the complete object constructor is to remangle the name of the subobject constructor and look it up in the symbol table. I think that's bad because DWARF should be usable without a symbol table. However, I suspect that's not going to be easy...

Copy link
Member Author

@Michael137 Michael137 Aug 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is actually redundant because we can just rely on the fallback lookup

Is it? I can see how the fallback lookup can work for functions where you have mangled names, as those are more-or-less unique. However, I don't see how we could find the correct constructor if all we have is the name of the class.

Yup that's correct. I meant it should work for mangled name lookup. For constructors we just would behave as we do today. Not ideal, but not a regression. But yea, ideally we'd fix this as you suggest. All that is to say, the type-unit check in this PR is redundant (regardless of which route we take).

Happy to try and change it to a structural match though.

I think that is necessary for this to work. Otherwise, this will fail whenever we parse a type from a CU which does not contain the definition of the methods (at least constructors) of that type. And I think that can happen pretty easily in a big binary.

Would a structural match here help though? If the declaration DIE for a ctor got encoded into the AsmLabel but the definition lives in a different CU (and thus SymbolFile), we wouldn't be able to find it anyway using our approach right? AFAIU, the case where a structural match would help is if:

  1. constructor declaration DIE (and its module) got encoded into the AsmLabel
  2. the definition lives in the same module but the DW_AT_specification points outside of the module

I'm sure DWARF producers are allowed to do this (and probably already do?) but I have been having a hard time coming up with an example of (2) happening. With type-units this can happen because the declaration lives in .debug_types but the definition (and its specification) lives in .debug_info. So the SymbolFile lookup will find the definition but the specification DIE is not the same as the declaration DIE. So there a structural match would help.

Here a in lib.o would point to the DW_TAG_structure_type of main.o, but the subprogram definitions for the constructors are different (and local to each CU).

Hm... That's confusing. I was under the impression that ld64 doesn't touch debug info (that it just leaves breadcrumbs so that the debug info can be linked later. Did that change? If not, how can lib.o point to a different file. Surely clang does not inspect main.o when compiling lib.o.

Or did you mean that this happens after running dsymutil over the binary? Because this matches what I would expect dsymutil to do.

Ah yes, sorry, forgot about dsymutil here.

Do you prefer us looking into alternatives to the approach in this PR? E.g., make DWARF do some of the heavy lifting? We could put e.g., the structor variant info into DWARF, alleviating the need for demangling, etc.

No, I still think this is the best approach for this. I'm just disappointed that it's turning out to be more complicated than I expected. I view the introduction of abi tags as an orthogonal issue -- if it gets included in DWARF, we can include it in the structural match.

Ah I see, agreed

I would love if DWARF contained more information on structor variants. I find it disappointing that the only way to locate the complete object constructor is to remangle the name of the subobject constructor and look it up in the symbol table. I think that's bad because DWARF should be usable without a symbol table. However, I suspect that's not going to be easy...

I wonder how GDB makes use of the placeholder mangling (i.e., C4) that GCC puts on declarations. Or if not for GDB, why does GCC do it? I guess if we had such a placeholder mangling, our lookup could become "find all DIEs with name Foo whose specification has _Z3FooC4 linkage name (regardless of what declaration DIE exactly got encoded into the label)". That does make our lives a bit easier. But maybe that's not a blocker for this PR (unless it turns out the structural match is a big problem).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the structural match, were you suggesting we re-use the innards of CopyUniqueClassMethodTypes to do so, or that there is something on the DWARFASTParserClang that should've been cached about the fact that the two DIEs are structurally the same?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the structural match, were you suggesting we re-use the innards of CopyUniqueClassMethodTypes to do so, or that there is something on the DWARFASTParserClang that should've been cached about the fact that the two DIEs are structurally the same?

I'm not sure. It would be great if we could reuse that code, but it may require refactoring something. I can't say I'm very familiar with how that code works -- I just know it's used in situations (see the class_type->GetID() != decl_ctx_die.GetID() check in DWARFASTParserClang::ParseCXXMethod) where we parse type from a DIE which is different that the type DIE we found by parsing one of its methods -- which sounds very similar to the situation we have here.

I don't think we can rely on anything being cached already, because lldb may not have had a reason to access the CU containing this constructor yet, but we may be able to force that information into cache by calling ParseCXXMethod or some part of it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another caveat that we would need to handle with the structural match, particularly for the type-unit case, is that the declaration's parent DIE might also just be a forward declaration, i.e., it wouldn't have any data-member DIEs. So iterating over the decl-contexts children and comparing their names wouldn't work unless we follow the type-units. The infrastructure for that might already be in place though, not sure.

Regarding finding structor definitions in other symbol files. I'm starting to think we do need help from the compiler here. Particularly, if we, like GCC, attached a generic mangled name to the declaration, and made sure we index it to point to the definitions, that would simplify a couple of things for us:

  1. we wouldn't need the structural matches anymore, because we found the definitions by a mangled name lookup. And all we need to do is compare structor variants.
  2. the fallback lookup would now be a mangled name lookup, and we would "just" have to modify that global lookup to pick the right variant too

Wdyt? Do you think we can handle the cross-module lookup feasibly without this?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If both modules are DWARF, then I think it's doable. Getting the information from one module to another would be tricky, but not impossible. And I don't think there's a particularly pressing need to resolve DWARF declarations with definitions from a PDB.

That said, the idea to add linkage names to constructors makes perfect sense as well. It aligns with what gcc does for constructors and also aligns with what both gcc and clang do with regular functions. I am slightly worried about what will that do to debug info size, but I can't say if that will be a problem or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Getting the information from one module to another would be tricky, but not impossible

But I think it would be one of the pre-requisites to landing this, right? Since that already works today (albeit maybe not completely correctly for all cases), and seems like a common thing people would run into in a non-trivial program.

That said, the idea to add linkage names to constructors makes perfect sense as well. It aligns with what gcc does for constructors and also aligns with what both gcc and clang do with regular functions. I am slightly worried about what will that do to debug info size, but I can't say if that will be a problem or not.

Let me see what the debug-info size impact of this would be

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think it would be one of the pre-requisites to landing this, right?

Yeah, I suppose that's true.

@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch from bb10018 to d5f72a6 Compare August 8, 2025 09:54
@llvmbot llvmbot added clang:codegen IR generation bugs: mangling, exceptions, etc. llvm:codegen debuginfo labels Aug 8, 2025
…for LLDB JIT expressions

This patch adds special handling for `AsmLabel`s created by LLDB. LLDB
uses `AsmLabel`s to encode information about a function declaration to
make it easier to locate function symbols when JITing C++ expressions.
For constructors/destructors LLDB doesn't know at the time of creating
the `AsmLabelAttr` which structor variant the expression evaluator will
need to call (this is decided when compiling the expression). So we make
the Clang mangler inject this information into our custom label when
we're JITting the expression.
This patch adds a way to retrieve the constructor/destructor variant
from the Itanium demangle tree. This will be used by LLDB.
This patch is an implementation of [this discussion](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7) about handling ABI-tagged structors during expression evaluation.

**Motivation**

LLDB encodes the mangled name of a `DW_TAG_subprogram` into `AsmLabel`s on function and method Clang AST nodes. This means that when calls to these functions get lowered into IR (when running JITted expressions), the address resolver can locate the appropriate symbol by mangled name (and it's guaranteed to find the symbol because we got the mangled name from debug-info, instead of letting Clang mangle it based on AST structure). However, we don't do this for `CXXConstructorDecl`s/`CXXDestructorDecl`s because these structor declarations in DWARF don't have a linkage name. This is because there can be multiple variants of a structor, each with a distinct mangling in the Itanium ABI. Each structor variant has its own definition `DW_TAG_subprogram`. So LLDB doesn't know which mangled name to put into the `AsmLabel`.

Currently this means using an ABI-tagged constructor in LLDB expressions won't work (see [this RFC](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816) for concrete examples).

**Proposed Solution**

The `FunctionCallLabel` encoding that we put into `AsmLabel`s already supports stuffing more info about a DIE into it. So this patch extends the `FunctionCallLabel` to contain an optional discriminator (a sequence of bytes) which the `SymbolFileDWARF` plugin interprets as the constructor/destructor variant of that DIE.

There's a few subtleties here:
1. At the point at which LLDB first constructs the label, it has no way of knowing (just by looking the debug-info declaration), which structor variant the expression evaluator is supposed to call. That's something that gets decided when compiling the expression. So we let the Clang mangler inject the correct structor variant into the `AsmLabel` during JITing. I adjusted the `AsmLabelAttr` mangling for this. An option would've been to create a new Clang attribute which behaved like an `AsmLabel` but with these special semantics for LLDB. My main concern there is that we'd have to adjust all the `AsmLabelAttr` checks around Clang to also now account for this new attribute.
2. The compiler is free to omit the `C1` variant of a constructor if the `C2` variant is sufficient. In that case it may alias `C1` to `C2`, leaving us with only the `C2` `DW_TAG_subprogram` in the object file. Linux is one of the platforms where this occurs. For those cases there's heuristic in `SymbolFileDWARF` where we pick `C2` if we asked for `C1` but it doesn't exist. This may not always be correct (if for some reason the compiler decided to drop `C1` for other reasons).
@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch 3 times, most recently from 9852fd8 to 52da84e Compare August 8, 2025 19:36
Copy link

github-actions bot commented Aug 8, 2025

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r HEAD~1...HEAD lldb/test/API/lang/cpp/abi_tag_structors/TestAbiTagStructors.py lldb/test/API/lang/cpp/constructors/TestCppConstructors.py lldb/test/API/lang/cpp/expr-definition-in-dylib/TestExprDefinitionInDylib.py
View the diff from darker here.
--- constructors/TestCppConstructors.py	2025-08-08 22:47:24.000000 +0000
+++ constructors/TestCppConstructors.py	2025-08-08 22:50:22.529565 +0000
@@ -37,20 +37,20 @@
             error=True,
             substrs=["Couldn't look up symbols:"],
         )
 
         # FIXME: Calling deleted constructors should fail before linking.
-        #self.expect(
+        # self.expect(
         #    "expr ClassWithDeletedCtor(1).value",
         #    error=True,
         #    substrs=["Couldn't look up symbols:"],
-        #)
-        #self.expect(
+        # )
+        # self.expect(
         #    "expr ClassWithDeletedDefaultCtor().value",
         #    error=True,
         #    substrs=["Couldn't look up symbols:", "function", "optimized out"],
-        #)
+        # )
 
     @skipIfWindows  # Can't find operator new.
     @skipIfLinux  # Fails on some Linux systems with SIGABRT.
     def test_constructors_new(self):
         self.build()

@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch from 52da84e to f8725a9 Compare August 8, 2025 19:50
@Michael137
Copy link
Member Author

@labath Thanks for the reviews so far. The latest commit contains a prototype of adding the GCC-style unified mangled name to constructor/destructor declarations. Turns out this wasn't too hard and actually solves all the issues we talked about pretty nicely. Mainly needed to ensure the accelerator tables map the unified mangled name to the definition DIE. And thus also had to adjust the GetFunctions LLDB implementation (though there may be better ways of doing that).

The issue of checking that the specification came from the right place is redundant now because we lookup structors within a SymbolFile up by mangled name too. And the fallback path in case the definition is in a different module works too because we encode the mangled structor name in the label (albeit the unified one, so there may still need to be some work to be done to ensure we pick the right variant). Also for older clang versions, if the structor declaration doesn't have a mangled name, which just don't attach an AsmLabel, so that would work as LLDB does today.

Let me know what you think. I'll prepare a separate PR to discuss these changes

@Michael137 Michael137 force-pushed the lldb/reliable-function-call-labels-structors branch from f8725a9 to 6fc22b7 Compare August 8, 2025 22:47
krishna2803 pushed a commit to krishna2803/llvm-project that referenced this pull request Aug 12, 2025
…llvm#148877)

LLDB currently attaches `AsmLabel`s to `FunctionDecl`s such that that
the `IRExecutionUnit` can determine which mangled name to call (we can't
rely on Clang deriving the correct mangled name to call because the
debug-info AST doesn't contain all the info that would be encoded in the
DWARF linkage names). However, we don't attach `AsmLabel`s for structors
because they have multiple variants and thus it's not clear which
mangled name to use. In the [RFC on fixing expression evaluation of
abi-tagged
structors](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816)
we discussed encoding the structor variant into the `AsmLabel`s.
Specifically in [this
thread](https://discourse.llvm.org/t/rfc-lldb-handling-abi-tagged-constructors-destructors-in-expression-evaluator/82816/7)
we discussed that the contents of the `AsmLabel` are completely under
LLDB's control and we could make use of it to uniquely identify a
function by encoding the exact module and DIE that the function is
associated with (mangled names need not be enough since two identical
mangled symbols may live in different modules). So if we already have a
custom `AsmLabel` format, we can encode the structor variant in a
follow-up (the current idea is to append the structor variant as a
suffix to our custom `AsmLabel` when Clang emits the mangled name into
the JITted IR). Then we would just have to teach the `IRExecutionUnit`
to pick the correct structor variant DIE during symbol resolution. The
draft of this is available
[here](llvm#149827)

This patch sets up the infrastructure for the custom `AsmLabel` format
by encoding the module id, DIE id and mangled name in it.

**Implementation**

The flow is as follows:
1. Create the label in `DWARFASTParserClang`. The format is:
`$__lldb_func:module_id:die_id:mangled_name`
2. When resolving external symbols in `IRExecutionUnit`, we parse this
label and then do a lookup by DIE ID (or mangled name into the module if
the encoded DIE is a declaration).

Depends on llvm#151355
krishna2803 pushed a commit to krishna2803/llvm-project that referenced this pull request Aug 12, 2025
This was added purely for the needs of LLDB. However, starting with
llvm#151355 and
llvm#148877 we no longer create
literal AsmLabels in LLDB either. So this is unused and we can get rid
of it.

In the near future LLDB will want to add special support for mangling
`AsmLabel`s (specifically
llvm#149827).
@Michael137
Copy link
Member Author

Michael137 commented Aug 12, 2025

FYI, measured the impact of adding a unified C4 linkage_name to ctor/dtor declarations. Here's the debug-info size diff on Linux across all object files in my LLVM build:

    FILE SIZE        VM SIZE    
 --------------  -------------- 
   +15%  +207Mi  [ = ]       0    .debug_str
  +9.6% +56.1Mi  [ = ]       0    .rela.debug_str_offsets
  +1.6% +9.72Mi  [ = ]       0    .debug_info
  +9.6% +9.35Mi  [ = ]       0    .debug_str_offsets
  +0.1% +7.77Ki  [ = ]       0    .debug_line_str

@dwblaikie
Copy link
Collaborator

a 15% growth to object files is pretty rough - but I guess this is without compression enabled, which'll /probably/ save us (Google)...

@cmtice / @labath you folks might want to check what the impact of this change (@Michael137 do you have a PR to point to that folks could test with)

@Michael137
Copy link
Member Author

a 15% growth to object files is pretty rough - but I guess this is without compression enabled, which'll /probably/ save us (Google)...

@cmtice / @labath you folks might want to check what the impact of this change (@Michael137 do you have a PR to point to that folks could test with)

Here's a draft PR: #153369

It's the bare minimum to get my prototype working

@labath
Copy link
Collaborator

labath commented Aug 13, 2025

I ran our internal debug info metrics tests, and while the numbers appear to be better than what you reported, the increase is still of the magnitude that would cause a problem for us. The increase is about 3-4% of binary size. The bulk of the increase comes from debug_str and debug_str_offsets (both around 7-8%). Our largest binary doesn't even build as it goes over the 4GB debug_str limit. debug_info also increases a bit (<1%). Funnily enough, debug_abbrev actually decreases, presumably because the abbreviations can be shared with regular method declarations, but that doesn't matter as abbreviations are small anyway.

I'm rerunning the numbers to confirm this (I originally tested with the version in this PR and not #153369), but I don't expect this to make a difference.

@Michael137
Copy link
Member Author

I ran our internal debug info metrics tests, and while the numbers appear to be better than what you reported, the increase is still of the magnitude that would cause a problem for us. The increase is about 3-4% of binary size. The bulk of the increase comes from debug_str and debug_str_offsets (both around 7-8%). Our largest binary doesn't even build as it goes over the 4GB debug_str limit. debug_info also increases a bit (<1%). Funnily enough, debug_abbrev actually decreases, presumably because the abbreviations can be shared with regular method declarations, but that doesn't matter as abbreviations are small anyway.

I'm rerunning the numbers to confirm this (I originally tested with the version in this PR and not #153369), but I don't expect this to make a difference.

Thanks for checking this!

Our largest binary doesn't even build as it goes over the 4GB debug_str limit. debug_info also increases a bit (<1%).

Am I misremembering that this came up before elsewhere? I guess that's what the simple-template-name stuff fixed?

Not sure where we go from here then if this is a blocker. Maybe there's a way to reduce the amount of linkage names we emit, though not sure off the top how we would decide that. The only reason I tried doing it the GCC way was that we could look up the structor definitions without having to do structural matching on the DIE context. I suppose we can go back to that method. That would also still leave the cross-module case an open question..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category debuginfo libc++abi libc++abi C++ Runtime Library. Not libc++. lldb llvm:codegen
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants