Skip to content

Conversation

@CarlosAlbertoEnciso
Copy link
Member

@CarlosAlbertoEnciso CarlosAlbertoEnciso commented Nov 12, 2025

Given the test case (CBase is a structure or class):

int function(CBase *Input) {
  int Output = Input->foo();
  return Output;
}

and using '-emit-call-site-info' with llc, currently the following DWARF debug information is produced for the indirect call 'Input->foo()':

0x51: DW_TAG_call_site
        DW_AT_call_target    (DW_OP_reg0 RAX)
        DW_AT_call_return_pc (0x0e)

Initial context from the SCE debugger point of view:

We can detect when a function has been affected by Identical Code Folding (ICF) from DWARF call-site information. For example,

  1. $RIP is currently in 'function()'
  2. The call-site information in the parent frame indicates we should have called 'foo()'

If we see a situation like the above, we can infer that 'foo()' and 'function()' have been code folded. This technique breaks when dealing with virtual functions because the call-site information only provides a way to find the target function at runtime.

However, if the call-site information includes information on which virtual function is being called, we can compare this against the $RIP location to see if we are in an implementation of the virtual function. If we are not, then we can assume we have been code folded.

For this to work we just to need to record which virtual function we are calling. We do not need to know the type of the 'this' pointer at the call-site.

This patch was created to help in the identification of the intended target of a virtual call in the SCE debugger.

By adding the DW_AT_call_origin for indirect calls, the debugger can identify the intended target of a call. These are the actions taking by the SCE debugger:

  • The debugger can detect functions that have been folding by comparing whether the DW_AT_call_origin matches the call frame function. If it does not, the debugger can assume the "true" target and the actual target have been code folded and add a frame annotation to the call stack to indicate this. That, or there is a tail call from foo to function, but the debugger can disambiguate these cases by looking at the DW_AT_call_origin referenced subroutine DIE which has a tombstone DW_AT_low_pc in the ICF case.

  • For virtual calls such as the given test case, the existence of the DW_AT_call_target attribute tells the debugger that this is an indirect jump, and the DW_AT_call_origin attribute, pointing to the base class method DIE, will tell which method is being called.

  • The debugger can confirm from the method's DIE that it is a virtual function call by looking at the attributes (DW_AT_virtuality, and DW_AT_vtable_elem_location) and can look at the parent DIE to work out the type.

This is the added DW_AT_call_origin to identify the target call CBase::foo.

0x51: DW_TAG_call_site
        DW_AT_call_target    (DW_OP_reg0 RAX)
        DW_AT_call_return_pc (0x0e)
        -----------------------------------------------
        DW_AT_call_origin    (0x71 "_ZN5CBase3fooEb")
        -----------------------------------------------

0x61: DW_TAG_class_type
        DW_AT_name            ("CBase")
              ...
0x71:   DW_TAG_subprogram
           DW_AT_linkage_name ("_ZN5CBase3fooEb")
           DW_AT_name         ("foo")

The extra call site information is available by default.

@CarlosAlbertoEnciso CarlosAlbertoEnciso self-assigned this Nov 12, 2025
@CarlosAlbertoEnciso CarlosAlbertoEnciso added clang Clang issues not falling into any other category lldb clang:codegen IR generation bugs: mangling, exceptions, etc. debuginfo labels Nov 12, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 12, 2025

@llvm/pr-subscribers-llvm-binary-utilities
@llvm/pr-subscribers-backend-aarch64
@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-backend-arm
@llvm/pr-subscribers-backend-x86
@llvm/pr-subscribers-llvm-ir
@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-debuginfo

Author: Carlos Alberto Enciso (CarlosAlbertoEnciso)

Changes

Given the test case (CBase is a structure or class):

int function(CBase *Input) {
  int Output = Input->foo();
  return Output;
}

and using '-emit-call-site-info' with llc, the following DWARF debug information is produced for the indirect call 'Input->foo()':

0x51: DW_TAG_call_site
        DW_AT_call_target    (DW_OP_reg0 RAX)
        DW_AT_call_return_pc (0x0e)

This patch generates an extra 'DW_AT_call_origin' to identify the target call 'CBase::foo'.

0x51: DW_TAG_call_site
        DW_AT_call_target    (DW_OP_reg0 RAX)
        DW_AT_call_return_pc (0x0e)
        -----------------------------------------------
        DW_AT_call_origin    (0x71 "_ZN5CBase3fooEb")
        -----------------------------------------------

0x61: DW_TAG_class_type
        DW_AT_name            ("CBase")
              ...
0x71:   DW_TAG_subprogram
           DW_AT_linkage_name ("_ZN5CBase3fooEb")
           DW_AT_name         ("foo")

The extra call site information is generated only for the SCE debugger: '-Xclang -debugger-tuning=sce'


Patch is 26.55 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/167666.diff

14 Files Affected:

  • (modified) clang/lib/CodeGen/CGCall.cpp (+3)
  • (modified) clang/lib/CodeGen/CGDebugInfo.cpp (+112-2)
  • (modified) clang/lib/CodeGen/CGDebugInfo.h (+28)
  • (modified) clang/lib/CodeGen/CodeGenFunction.cpp (+2)
  • (added) clang/test/DebugInfo/CXX/callsite-base.cpp (+42)
  • (added) clang/test/DebugInfo/CXX/callsite-derived.cpp (+85)
  • (added) clang/test/DebugInfo/CXX/callsite-edges.cpp (+125)
  • (added) cross-project-tests/debuginfo-tests/clang_llvm_roundtrip/callsite-dwarf.cpp (+71)
  • (modified) llvm/include/llvm/CodeGen/MachineFunction.h (+6)
  • (modified) llvm/include/llvm/IR/FixedMetadataKinds.def (+1)
  • (modified) llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp (+26-1)
  • (modified) llvm/lib/CodeGen/MIRPrinter.cpp (+1-1)
  • (modified) llvm/lib/CodeGen/MachineFunction.cpp (+3)
  • (modified) llvm/lib/Target/X86/X86ISelLoweringCall.cpp (+9-2)
diff --git a/clang/lib/CodeGen/CGCall.cpp b/clang/lib/CodeGen/CGCall.cpp
index efacb3cc04c01..5b5ed52e1d554 100644
--- a/clang/lib/CodeGen/CGCall.cpp
+++ b/clang/lib/CodeGen/CGCall.cpp
@@ -5958,6 +5958,9 @@ RValue CodeGenFunction::EmitCall(const CGFunctionInfo &CallInfo,
     }
   }
 
+  if (CGDebugInfo *DI = CGM.getModuleDebugInfo())
+    DI->addCallTarget(CI->getCalledFunction(), CalleeDecl, CI);
+
   // If this is within a function that has the guard(nocf) attribute and is an
   // indirect call, add the "guard_nocf" attribute to this call to indicate that
   // Control Flow Guard checks should not be added, even if the call is inlined.
diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index bda7b7487f59b..44d2bf5527c7f 100644
--- a/clang/lib/CodeGen/CGDebugInfo.cpp
+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -2430,6 +2430,9 @@ llvm::DISubprogram *CGDebugInfo::CreateCXXMemberFunction(
 
   SPCache[Method->getCanonicalDecl()].reset(SP);
 
+  // Add the method declaration as a call target.
+  addCallTarget(MethodLinkageName, SP, /*CI=*/nullptr);
+
   return SP;
 }
 
@@ -4955,6 +4958,99 @@ void CGDebugInfo::EmitFunctionDecl(GlobalDecl GD, SourceLocation Loc,
     Fn->setSubprogram(SP);
 }
 
+bool CGDebugInfo::generateCallSiteForPS() const {
+  // The added call target will be available only for SCE targets.
+  if (CGM.getCodeGenOpts().getDebuggerTuning() != llvm::DebuggerKind::SCE)
+    return false;
+
+  // Check general conditions for call site generation.
+  return (getCallSiteRelatedAttrs() != llvm::DINode::FlagZero);
+}
+
+// Set the 'call_target' metadata in the call instruction.
+void CGDebugInfo::addCallTargetMetadata(llvm::MDNode *MD, llvm::CallBase *CI) {
+  if (!MD || !CI)
+    return;
+  CI->setMetadata(llvm::LLVMContext::MD_call_target, MD);
+}
+
+// Finalize call_target generation.
+void CGDebugInfo::finalizeCallTarget() {
+  if (!generateCallSiteForPS())
+    return;
+
+  for (auto &E : CallTargetCache) {
+    for (const auto &WH : E.second.second) {
+      llvm::CallBase *CI = dyn_cast_or_null<llvm::CallBase>(WH);
+      addCallTargetMetadata(E.second.first, CI);
+    }
+  }
+}
+
+void CGDebugInfo::addCallTarget(StringRef Name, llvm::MDNode *MD,
+                                llvm::CallBase *CI) {
+  if (!generateCallSiteForPS())
+    return;
+
+  // Record only indirect calls.
+  if (CI && !CI->isIndirectCall())
+    return;
+
+  // Nothing to do.
+  if (Name.empty())
+    return;
+
+  auto It = CallTargetCache.find(Name);
+  if (It == CallTargetCache.end()) {
+    // First time we see 'Name'. Insert record for later finalize.
+    InstrList List;
+    if (CI)
+      List.push_back(CI);
+    CallTargetCache.try_emplace(Name, MD, std::move(List));
+  } else {
+    if (MD)
+      It->second.first.reset(MD);
+    if (CI) {
+      InstrList &List = It->second.second;
+      List.push_back(CI);
+    }
+  }
+}
+
+void CGDebugInfo::addCallTarget(llvm::Function *F, const FunctionDecl *FD,
+                                llvm::CallBase *CI) {
+  if (!generateCallSiteForPS())
+    return;
+
+  if (!F && !FD)
+    return;
+
+  // Ignore method types that never can be indirect calls.
+  if (!F && (isa<CXXConstructorDecl>(FD) || isa<CXXDestructorDecl>(FD) ||
+             FD->hasAttr<CUDAGlobalAttr>()))
+    return;
+
+  StringRef Name = (F && F->hasName()) ? F->getName() : CGM.getMangledName(FD);
+  addCallTarget(Name, /*MD=*/nullptr, CI);
+}
+
+void CGDebugInfo::removeCallTarget(StringRef Name) {
+  if (!generateCallSiteForPS())
+    return;
+
+  auto It = CallTargetCache.find(Name);
+  if (It != CallTargetCache.end())
+    CallTargetCache.erase(It);
+}
+
+void CGDebugInfo::removeCallTarget(llvm::Function *F) {
+  if (!generateCallSiteForPS())
+    return;
+
+  if (F && F->hasName())
+    removeCallTarget(F->getName());
+}
+
 void CGDebugInfo::EmitFuncDeclForCallSite(llvm::CallBase *CallOrInvoke,
                                           QualType CalleeType,
                                           GlobalDecl CalleeGlobalDecl) {
@@ -4978,9 +5074,15 @@ void CGDebugInfo::EmitFuncDeclForCallSite(llvm::CallBase *CallOrInvoke,
   // If there is no DISubprogram attached to the function being called,
   // create the one describing the function in order to have complete
   // call site debug info.
-  if (!CalleeDecl->isStatic() && !CalleeDecl->isInlined())
+  if (!CalleeDecl->isStatic() && !CalleeDecl->isInlined()) {
     EmitFunctionDecl(CalleeGlobalDecl, CalleeDecl->getLocation(), CalleeType,
                      Func);
+    if (Func->getSubprogram()) {
+      // For each call instruction emitted, add the call site target metadata.
+      llvm::DISubprogram *SP = Func->getSubprogram();
+      addCallTarget(SP->getLinkageName(), SP, /*CI=*/nullptr);
+    }
+  }
 }
 
 void CGDebugInfo::EmitInlineFunctionStart(CGBuilderTy &Builder, GlobalDecl GD) {
@@ -5082,8 +5184,13 @@ void CGDebugInfo::EmitFunctionEnd(CGBuilderTy &Builder, llvm::Function *Fn) {
   }
   FnBeginRegionCount.pop_back();
 
-  if (Fn && Fn->getSubprogram())
+  if (Fn && Fn->getSubprogram()) {
+    // For each call instruction emitted, add the call site target metadata.
+    llvm::DISubprogram *SP = Fn->getSubprogram();
+    addCallTarget(SP->getLinkageName(), SP, /*CI=*/nullptr);
+
     DBuilder.finalizeSubprogram(Fn->getSubprogram());
+  }
 }
 
 CGDebugInfo::BlockByRefType
@@ -6498,6 +6605,9 @@ void CGDebugInfo::finalize() {
     if (auto MD = TypeCache[RT])
       DBuilder.retainType(cast<llvm::DIType>(MD));
 
+  // Generate call_target information.
+  finalizeCallTarget();
+
   DBuilder.finalize();
 }
 
diff --git a/clang/lib/CodeGen/CGDebugInfo.h b/clang/lib/CodeGen/CGDebugInfo.h
index 2378bdd780b3b..fd71d958bc1f5 100644
--- a/clang/lib/CodeGen/CGDebugInfo.h
+++ b/clang/lib/CodeGen/CGDebugInfo.h
@@ -682,6 +682,15 @@ class CGDebugInfo {
   /// that it is supported and enabled.
   llvm::DINode::DIFlags getCallSiteRelatedAttrs() const;
 
+  /// Add call target information.
+  void addCallTarget(StringRef Name, llvm::MDNode *MD, llvm::CallBase *CI);
+  void addCallTarget(llvm::Function *F, const FunctionDecl *FD,
+                     llvm::CallBase *CI);
+
+  /// Remove a call target entry for the given name or function.
+  void removeCallTarget(StringRef Name);
+  void removeCallTarget(llvm::Function *F);
+
 private:
   /// Amend \p I's DebugLoc with \p Group (its source atom group) and \p
   /// Rank (lower nonzero rank is higher precedence). Does nothing if \p I
@@ -903,6 +912,25 @@ class CGDebugInfo {
   /// If one exists, returns the linkage name of the specified \
   /// (non-null) \c Method. Returns empty string otherwise.
   llvm::StringRef GetMethodLinkageName(const CXXMethodDecl *Method) const;
+
+  /// For each 'DISuprogram' we store a list of call instructions 'CallBase'
+  /// that indirectly call  such 'DISuprogram'. We use its linkage name to
+  /// update such list.
+  /// The 'CallTargetCache' is updated in the following scenarios:
+  /// - Both 'CallBase' and 'MDNode' are ready available.
+  /// - If only the 'CallBase' or 'MDNode' are are available, the partial
+  ///   information is added and later is completed when the missing item
+  ///   ('CallBase' or 'MDNode') is available.
+  using InstrList = llvm::SmallVector<llvm::WeakVH, 2>;
+  using CallTargetEntry = std::pair<llvm::TrackingMDNodeRef, InstrList>;
+  llvm::SmallDenseMap<StringRef, CallTargetEntry> CallTargetCache;
+
+  /// Generate call target information only for SIE debugger.
+  bool generateCallSiteForPS() const;
+
+  /// Add 'call_target' metadata to the 'call' instruction.
+  void addCallTargetMetadata(llvm::MDNode *MD, llvm::CallBase *CI);
+  void finalizeCallTarget();
 };
 
 /// A scoped helper to set the current debug location to the specified
diff --git a/clang/lib/CodeGen/CodeGenFunction.cpp b/clang/lib/CodeGen/CodeGenFunction.cpp
index 88628530cf66b..6d83b5fdd0e1b 100644
--- a/clang/lib/CodeGen/CodeGenFunction.cpp
+++ b/clang/lib/CodeGen/CodeGenFunction.cpp
@@ -1510,6 +1510,8 @@ void CodeGenFunction::GenerateCode(GlobalDecl GD, llvm::Function *Fn,
     // Clear non-distinct debug info that was possibly attached to the function
     // due to an earlier declaration without the nodebug attribute
     Fn->setSubprogram(nullptr);
+    if (CGDebugInfo *DI = getDebugInfo())
+      DI->removeCallTarget(Fn);
     // Disable debug info indefinitely for this function
     DebugInfo = nullptr;
   }
diff --git a/clang/test/DebugInfo/CXX/callsite-base.cpp b/clang/test/DebugInfo/CXX/callsite-base.cpp
new file mode 100644
index 0000000000000..ed7c455ced9d7
--- /dev/null
+++ b/clang/test/DebugInfo/CXX/callsite-base.cpp
@@ -0,0 +1,42 @@
+// Simple class with only virtual methods: inlined and not-inlined
+// We check for a generated 'call_target' for:
+// - 'one', 'two' and 'three'.
+
+class CBase {
+public:
+  virtual void one();
+  virtual void two();
+  virtual void three() {}
+};
+void CBase::one() {}
+
+void bar(CBase *Base) {
+  Base->one();
+  Base->two();
+  Base->three();
+
+  CBase B;
+  B.one();
+}
+
+// RUN: %clang_cc1 -debugger-tuning=sce -triple=x86_64-linux -disable-llvm-passes -emit-llvm -debug-info-kind=constructor -dwarf-version=5 -O1 %s -o - | FileCheck %s -check-prefix CHECK-BASE
+
+// CHECK-BASE: define {{.*}} @_Z3barP5CBase{{.*}} {
+// CHECK-BASE-DAG:   call void %1{{.*}} !dbg {{![0-9]+}}, !call_target [[BASE_ONE:![0-9]+]]
+// CHECK-BASE-DAG:   call void %3{{.*}} !dbg {{![0-9]+}}, !call_target [[BASE_TWO:![0-9]+]]
+// CHECK-BASE-DAG:   call void %5{{.*}} !dbg {{![0-9]+}}, !call_target [[BASE_THREE:![0-9]+]]
+// CHECK-BASE-DAG:   call void @_ZN5CBaseC2Ev{{.*}} !dbg {{![0-9]+}}
+// CHECK-BASE-DAG:   call void @_ZN5CBase3oneEv{{.*}} !dbg {{![0-9]+}}
+// CHECK-BASE: }
+
+// CHECK-BASE-DAG: [[BASE_ONE]] = {{.*}}!DISubprogram(name: "one", linkageName: "_ZN5CBase3oneEv"
+// CHECK-BASE-DAG: [[BASE_TWO]] = {{.*}}!DISubprogram(name: "two", linkageName: "_ZN5CBase3twoEv"
+// CHECK-BASE-DAG: [[BASE_THREE]] = {{.*}}!DISubprogram(name: "three", linkageName: "_ZN5CBase5threeEv"
+
+// RUN: %clang_cc1 -triple=x86_64-linux -disable-llvm-passes -emit-llvm -debug-info-kind=constructor -dwarf-version=5 -O1 %s -o - | FileCheck %s -check-prefix CHECK-BASE-NON
+
+// CHECK-BASE-NON: define {{.*}} @_Z3barP5CBase{{.*}} {
+// CHECK-BASE-NON-DAG:   call void %1{{.*}} !dbg {{![0-9]+}}
+// CHECK-BASE-NON-DAG:   call void %3{{.*}} !dbg {{![0-9]+}}
+// CHECK-BASE-NON-DAG:   call void %5{{.*}} !dbg {{![0-9]+}}
+// CHECK-BASE-NON: }
diff --git a/clang/test/DebugInfo/CXX/callsite-derived.cpp b/clang/test/DebugInfo/CXX/callsite-derived.cpp
new file mode 100644
index 0000000000000..b1019c4b252a4
--- /dev/null
+++ b/clang/test/DebugInfo/CXX/callsite-derived.cpp
@@ -0,0 +1,85 @@
+// Simple base and derived class with virtual and static methods:
+// We check for a generated 'call_target' for:
+// - 'one', 'two' and 'three'.
+
+class CBase {
+public:
+  virtual void one(bool Flag) {}
+  virtual void two(int P1, char P2) {}
+  static void three();
+};
+
+void CBase::three() {
+}
+void bar(CBase *Base);
+
+void foo(CBase *Base) {
+  CBase::three();
+}
+
+class CDerived : public CBase {
+public:
+  void one(bool Flag) {}
+  void two(int P1, char P2) {}
+};
+void foo(CDerived *Derived);
+
+int main() {
+  CBase B;
+  bar(&B);
+
+  CDerived D;
+  foo(&D);
+
+  return 0;
+}
+
+void bar(CBase *Base) {
+  Base->two(77, 'a');
+}
+
+void foo(CDerived *Derived) {
+  Derived->one(true);
+}
+
+// RUN: %clang_cc1 -debugger-tuning=sce -triple=x86_64-linux -disable-llvm-passes -emit-llvm -debug-info-kind=constructor -dwarf-version=5 -O1 %s -o - | FileCheck %s -check-prefix CHECK-DERIVED
+
+// CHECK-DERIVED: define {{.*}} @_Z3fooP5CBase{{.*}} {
+// CHECK-DERIVED-DAG: call void @_ZN5CBase5threeEv{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED: }
+
+// CHECK-DERIVED: define {{.*}} @main{{.*}} {
+// CHECK-DERIVED-DAG:  call void @_ZN5CBaseC1Ev{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED-DAG:  call void @_Z3barP5CBase{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED-DAG:  call void @_ZN8CDerivedC1Ev{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED-DAG:  call void @_Z3fooP8CDerived{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED: }
+
+// CHECK-DERIVED: define {{.*}} @_ZN5CBaseC1Ev{{.*}} {
+// CHECK-DERIVED-DAG:  call void @_ZN5CBaseC2Ev{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED: }
+
+// CHECK-DERIVED: define {{.*}} @_Z3barP5CBase{{.*}} {
+// CHECK-DERIVED-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}, !call_target [[BASE_TWO:![0-9]+]]
+// CHECK-DERIVED: }
+
+// CHECK-DERIVED: define {{.*}} @_ZN8CDerivedC1Ev{{.*}} {
+// CHECK-DERIVED-DAG:  call void @_ZN8CDerivedC2Ev{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED: }
+
+// CHECK-DERIVED: define {{.*}} @_Z3fooP8CDerived{{.*}} {
+// CHECK-DERIVED-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}, !call_target [[DERIVED_ONE:![0-9]+]]
+// CHECK-DERIVED: }
+
+// CHECK-DERIVED-DAG: [[BASE_TWO]] = {{.*}}!DISubprogram(name: "two", linkageName: "_ZN5CBase3twoEic"
+// CHECK-DERIVED-DAG: [[DERIVED_ONE]] = {{.*}}!DISubprogram(name: "one", linkageName: "_ZN8CDerived3oneEb"
+
+// RUN: %clang_cc1 -triple=x86_64-linux -disable-llvm-passes -emit-llvm -debug-info-kind=constructor -dwarf-version=5 -O1 %s -o - | FileCheck %s -check-prefix CHECK-DERIVED-NON
+
+// CHECK-DERIVED-NON: define {{.*}} @_Z3barP5CBase{{.*}} {
+// CHECK-DERIVED-NON-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED-NON: }
+
+// CHECK-DERIVED-NON: define {{.*}} @_Z3fooP8CDerived{{.*}} {
+// CHECK-DERIVED-NON-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}
+// CHECK-DERIVED-NON: }
diff --git a/clang/test/DebugInfo/CXX/callsite-edges.cpp b/clang/test/DebugInfo/CXX/callsite-edges.cpp
new file mode 100644
index 0000000000000..1d4ef29f4b357
--- /dev/null
+++ b/clang/test/DebugInfo/CXX/callsite-edges.cpp
@@ -0,0 +1,125 @@
+// Check edge cases:
+
+//---------------------------------------------------------------------
+// Method is declared but not defined in current CU - Fail.
+// No debug information entry is generated for 'one'.
+// Generate 'call_target' metadata only for 'two'.
+//---------------------------------------------------------------------
+class CEmpty {
+public:
+  virtual void one(bool Flag);
+  virtual void two(int P1, char P2);
+};
+
+void CEmpty::two(int P1, char P2) {
+}
+
+void edge_a(CEmpty *Empty) {
+  Empty->one(false);
+  Empty->two(77, 'a');
+}
+
+//---------------------------------------------------------------------
+// Pure virtual method but not defined in current CU - Pass.
+// Generate 'call_target' metadata for 'one' and 'two'.
+//---------------------------------------------------------------------
+class CBase {
+public:
+  virtual void one(bool Flag) = 0;
+  virtual void two(int P1, char P2);
+};
+
+void CBase::two(int P1, char P2) {
+}
+
+void edge_b(CBase *Base) {
+  Base->one(false);
+  Base->two(77, 'a');
+}
+
+//---------------------------------------------------------------------
+// Virtual method defined very deeply - Pass.
+// Generate 'call_target' metadata for 'd0', 'd1', 'd2' and 'd3'.
+//---------------------------------------------------------------------
+struct CDeep {
+  struct CD1 {
+    struct CD2 {
+      struct CD3 {
+        virtual void d3(int P3);
+      };
+
+      CD3 D3;
+      virtual void d2(int P2);
+    };
+
+    CD2 D2;
+    virtual void d1(int P1);
+  };
+
+  CD1 D1;
+  virtual void d0(int P);
+};
+
+void CDeep::d0(int P) {}
+void CDeep::CD1::d1(int P1) {}
+void CDeep::CD1::CD2::d2(int P2) {}
+void CDeep::CD1::CD2::CD3::d3(int P3) {}
+
+void edge_c(CDeep *Deep) {
+  Deep->d0(0);
+
+  CDeep::CD1 *D1 = &Deep->D1;
+  D1->d1(1);
+
+  CDeep::CD1::CD2 *D2 = &D1->D2;
+  D2->d2(2);
+
+  CDeep::CD1::CD2::CD3 *D3 = &D2->D3;
+  D3->d3(3);
+}
+
+// RUN: %clang -Xclang -debugger-tuning=sce --target=x86_64-linux -Xclang -disable-llvm-passes -fno-discard-value-names -emit-llvm -S -g -O1 %s -o - | FileCheck %s -check-prefix CHECK-EDGES
+
+// CHECK-EDGES: define {{.*}} @_Z6edge_aP6CEmpty{{.*}} {
+// CHECK-EDGES-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-DAG:  call void %3{{.*}} !dbg {{![0-9]+}}, !call_target [[CEMPTY_TWO:![0-9]+]]
+// CHECK-EDGES: }
+
+// CHECK-EDGES: define {{.*}} @_Z6edge_bP5CBase{{.*}} {
+// CHECK-EDGES-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}, !call_target [[CBASE_ONE:![0-9]+]]
+// CHECK-EDGES-DAG:  call void %3{{.*}} !dbg {{![0-9]+}}, !call_target [[CBASE_TWO:![0-9]+]]
+// CHECK-EDGES: }
+
+// CHECK-EDGES: define {{.*}} @_Z6edge_cP5CDeep{{.*}} {
+// CHECK-EDGES-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}, !call_target [[CDEEP_D0:![0-9]+]]
+// CHECK-EDGES-DAG:  call void %4{{.*}} !dbg {{![0-9]+}}, !call_target [[CDEEP_D1:![0-9]+]]
+// CHECK-EDGES-DAG:  call void %7{{.*}} !dbg {{![0-9]+}}, !call_target [[CDEEP_D2:![0-9]+]]
+// CHECK-EDGES-DAG:  call void %10{{.*}} !dbg {{![0-9]+}}, !call_target [[CDEEP_D3:![0-9]+]]
+// CHECK-EDGES: }
+
+// CHECK-EDGES-DAG:  [[CEMPTY_TWO]] = {{.*}}!DISubprogram(name: "two", linkageName: "_ZN6CEmpty3twoEic"
+// CHECK-EDGES-DAG:  [[CBASE_ONE]] = {{.*}}!DISubprogram(name: "one", linkageName: "_ZN5CBase3oneEb"
+// CHECK-EDGES-DAG:  [[CBASE_TWO]] = {{.*}}!DISubprogram(name: "two", linkageName: "_ZN5CBase3twoEic"
+
+// CHECK-EDGES-DAG:  [[CDEEP_D0]] = {{.*}}!DISubprogram(name: "d0", linkageName: "_ZN5CDeep2d0Ei"
+// CHECK-EDGES-DAG:  [[CDEEP_D1]] = {{.*}}!DISubprogram(name: "d1", linkageName: "_ZN5CDeep3CD12d1Ei"
+// CHECK-EDGES-DAG:  [[CDEEP_D2]] = {{.*}}!DISubprogram(name: "d2", linkageName: "_ZN5CDeep3CD13CD22d2Ei"
+// CHECK-EDGES-DAG:  [[CDEEP_D3]] = {{.*}}!DISubprogram(name: "d3", linkageName: "_ZN5CDeep3CD13CD23CD32d3Ei"
+
+// RUN: %clang --target=x86_64-linux -Xclang -disable-llvm-passes -fno-discard-value-names -emit-llvm -S -g -O1 %s -o - | FileCheck %s -check-prefix CHECK-EDGES-NON
+
+// CHECK-EDGES-NON: define {{.*}} @_Z6edge_aP6CEmpty{{.*}} {
+// CHECK-EDGES-NON-DAG:  call void %3{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-NON: }
+
+// CHECK-EDGES-NON: define {{.*}} @_Z6edge_bP5CBase{{.*}} {
+// CHECK-EDGES-NON-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-NON-DAG:  call void %3{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-NON: }
+
+// CHECK-EDGES-NON: define {{.*}} @_Z6edge_cP5CDeep{{.*}} {
+// CHECK-EDGES-NON-DAG:  call void %1{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-NON-DAG:  call void %4{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-NON-DAG:  call void %7{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-NON-DAG:  call void %10{{.*}} !dbg {{![0-9]+}}
+// CHECK-EDGES-NON: }
diff --git a/cross-project-tests/debuginfo-tests/clang_llvm_roundtrip/callsite-dwarf.cpp b/cross-project-tests/debuginfo-tests/clang_llvm_roundtrip/callsite-dwarf.cpp
new file mode 100644
index 0000000000000..7374a355da549
--- /dev/null
+++ b/cross-project-tests/debuginfo-tests/clang_llvm_roundtrip/callsite-dwarf.cpp
@@ -0,0 +1,71 @@
+// Simple base and derived class with virtual:
+// We check for a generated 'DW_AT_call_origin' for 'foo', that corresponds
+// to the 'call_target' metadata added to the indirect call instruction.
+
+class CBaseOne {
+  virtual void foo(int &);
+};
+
+struct CDerivedOne : CBaseOne {
+  void foo(int &);
+};
+
+void CDerivedOne::foo(int &) {
+}
+
+struct CBaseTwo {
+  CDerivedOne *DerivedOne;
+};
+
+struct CDerivedTwo : CBaseTwo {
+  void bar(int &);
+};
+
+void CDerivedTwo::bar(int &j) {
+  DerivedOne->foo(j);
+}
+
+// The IR generated looks like:
+//
+// define dso_local void @_ZN11CDerivedTwo3barERi(...) !dbg !40 {
+// entry:
+//   ..
+//   %vtable = load ptr, ptr %0, align 8
+//   %vfn = getelementptr inbounds ptr, ptr %vtable, i64 0
+//   %2 = load ptr, ptr %vfn, align 8
+//   call void %2(...), !dbg !65, !call_target !25
+//   ret void
+// }
+//
+// !25 = !DISubprogram(name: "foo", linkageName: "_ZN11CDerivedOne3fooERi", ...)
+// !40 = !DISubprogram(name: "bar", linkageName: "_ZN11CDerivedTwo3barERi", ...)
+// !65 = !DILocation(line: 25, column: 15, scope: !40)
+
+// RUN: %clang --target=x86_64-unknown-linux -c -g -O1    \
+// RUN:        -Xclang -debugger-tuning=sce %s -o -     | \
+// RUN: llvm-dwarfdump --debug-info - | FileCheck %s --check-prefix=CHECK
+
+// CHECK: DW_TAG_compile_unit
+// CHECK:   DW_TAG_structure_type
+// CHECK:     DW_AT_name	("CDerivedOne")
+// CHECK: [[FOO_DCL:0x[a-f0-9]+]]:    DW_TAG_subprogram
+// CHECK:       DW_AT_name	("foo")
+// CHECK:   DW_TAG_class_type
+// CHECK:     DW_AT_name	("CBaseOne")
+// CHECK: [[FOO_DEF:0x[a-f0-9]+]]:  DW_TAG_subprogram
+// CHECK:     DW_AT_call_all_calls	(true)
+// CHECK:     DW_AT_specification	([[FOO_DCL]] "foo")
+// CHECK:  ...
[truncated]

@github-actions
Copy link

github-actions bot commented Nov 12, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@CarlosAlbertoEnciso
Copy link
Member Author

CarlosAlbertoEnciso commented Nov 12, 2025

@tromey For some reasons, your name does not appear in the reviewers list. I would appreciate if you can add yourself as reviewer. Thanks

@CarlosAlbertoEnciso CarlosAlbertoEnciso changed the title [clang][DebugInfo] Add call site target information in DWARF. [clang][DebugInfo] Add virtual call-site target information in DWARF. Nov 12, 2025
Given the test case (CBase is a structure or class):

int function(CBase *Input) {
  int Output = Input->foo();
  return Output;
}

and using '-emit-call-site-info' with llc, the following DWARF
debug information is produced for the indirect call 'Input->foo()':

0x51: DW_TAG_call_site
        DW_AT_call_target    (DW_OP_reg0 RAX)
        DW_AT_call_return_pc (0x0e)

This patch generates an extra 'DW_AT_call_origin' to identify
the target call 'CBase::foo'.

0x51: DW_TAG_call_site
        DW_AT_call_target    (DW_OP_reg0 RAX)
        DW_AT_call_return_pc (0x0e)
        -----------------------------------------------
        DW_AT_call_origin    (0x71 "_ZN5CBase3fooEb")
        -----------------------------------------------

0x61: DW_TAG_class_type
        DW_AT_name            ("CBase")
              ...
0x71:   DW_TAG_subprogram
           DW_AT_linkage_name ("_ZN5CBase3fooEb")
           DW_AT_name         ("foo")

The extra call site information is generated only for the SCE debugger:
'-Xclang -debugger-tuning=sce'
@jryans
Copy link
Member

jryans commented Nov 12, 2025

The extra call site information is generated only for the SCE debugger: '-Xclang -debugger-tuning=sce'

Do you have any data on how much this adds to debug info size for some known codebase (like Clang itself)? (See for example this comparison using bloaty on before and after Clang builds from a recent PR.)

I am curious if it would be possible to emit this data by default, but I assume people would want to see debug info size data before considering this.

@jryans jryans self-requested a review November 12, 2025 13:01
@CarlosAlbertoEnciso
Copy link
Member Author

@jryans This is the internal data that we collected:

The callsite changes are guarded by the option -Xclang -debugger-tuning=sce

[..]/bloaty ./callsite-dbg/bin/clang++ -- ./reference-dbg/bin/clang++

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +9.14Ki  [ = ]       0    .debug_info
  +0.0% +2.94Ki  [ = ]       0    .debug_abbrev
  +5.9%    +512  [ = ]       0    [Unmapped]
  +0.0%     +37  [ = ]       0    .debug_str_offsets
  +0.0%      +5  [ = ]       0    .debug_line
  +1.3%      +2  [ = ]       0    .comment
  -0.0%      -6  [ = ]       0    .debug_line_str
  -0.0%    -512  -0.0%    -512    .rodata
  -0.0% -1.73Ki  [ = ]       0    .debug_str
  +0.0% +10.4Ki  -0.0%    -512    TOTAL

[...] 1512681192 ./reference-dbg/bin/clang++
[...] 1512691832 ./callsite-dbg/bin/clang++

[..]/bloaty ./callsite-dbg/bin/clang -- ./reference-dbg/bin/clang

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +9.14Ki  [ = ]       0    .debug_info
  +0.0% +2.94Ki  [ = ]       0    .debug_abbrev
  +5.9%    +512  [ = ]       0    [Unmapped]
  +0.0%     +37  [ = ]       0    .debug_str_offsets
  +0.0%      +5  [ = ]       0    .debug_line
  +1.3%      +2  [ = ]       0    .comment
  -0.0%      -6  [ = ]       0    .debug_line_str
  -0.0%    -512  -0.0%    -512    .rodata
  -0.0% -1.73Ki  [ = ]       0    .debug_str
  +0.0% +10.4Ki  -0.0%    -512    TOTAL

[...] 1512681192 ./reference-dbg/bin/clang
[...] 1512691832 ./callsite-dbg/bin/clang

[..]/bloaty ./callsite-dbg/bin/llc -- ./reference-dbg/bin/llc

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +3.05Ki  [ = ]       0    .debug_info
  +0.0% +2.12Ki  [ = ]       0    .debug_abbrev
  +6.6%    +384  [ = ]       0    [Unmapped]
  +0.0%      +9  [ = ]       0    .debug_str_offsets
  +1.3%      +2  [ = ]       0    .comment
  -0.0%      -3  [ = ]       0    .debug_line_str
  -0.0%    -384  -0.0%    -384    .rodata
  -0.0% -1.63Ki  [ = ]       0    .debug_str
  +0.0% +3.55Ki  -0.0%    -384    TOTAL

[...] 986245752 ./reference-dbg/bin/llc
[...] 986249392 ./callsite-dbg/bin/llc
{code}

[..]/bloaty ./callsite-dbg/bin/opt -- ./reference-dbg/bin/opt
{code:none}
    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +3.05Ki  [ = ]       0    .debug_info
  +0.0% +2.12Ki  [ = ]       0    .debug_abbrev
   +17%    +320  [ = ]       0    [Unmapped]
  +0.0%     +17  [ = ]       0    .debug_str_offsets
  +1.3%      +2  [ = ]       0    .comment
  -0.0%      -3  [ = ]       0    .debug_line_str
  -0.0%    -320  -0.0%    -320    .rodata
  -0.0% -1.63Ki  [ = ]       0    .debug_str
  +0.0% +3.56Ki  -0.0%    -320    TOTAL

[...] 985916024 ./reference-dbg/bin/opt
[...] 985919672 ./callsite-dbg/bin/opt

[..]/bloaty ./callsite-dbg/bin/llvm-as -- ./reference-dbg/bin/llvm-as

    FILE SIZE        VM SIZE    
 --------------  -------------- 
  +0.0% +1.28Ki  [ = ]       0    .debug_info
  +0.0%    +284  [ = ]       0    .debug_abbrev
  +0.0%     +65  [ = ]       0    .debug_str
  +0.0%     +10  [ = ]       0    .debug_str_offsets
  +1.3%      +2  [ = ]       0    .comment
  -0.0%      -3  [ = ]       0    .debug_line_str
  +0.0% +1.62Ki  [ = ]       0    TOTAL

[...] 142183856 ./reference-dbg/bin/llvm-as
[...] 142185520 ./callsite-dbg/bin/llvm-as

@jryans
Copy link
Member

jryans commented Nov 12, 2025

@jryans This is the internal data that we collected:

The callsite changes are guarded by the option -Xclang -debugger-tuning=sce

Thanks for sharing that data. It looks e.g. clang++ shows total growth of only 10 KiB, which is quite small indeed compared to the total binary size.

Given this data, I would personally recommend we make this additional data available by default. Of course, we should also see what @dwblaikie and other debug-info-size-sensitive downstream users think as well.

@CarlosAlbertoEnciso
Copy link
Member Author

CarlosAlbertoEnciso commented Nov 13, 2025

@jryans @dwblaikie I updated the patch to make this additional data available by default.

Make the extra call site information available by default.
Copy link
Member

@jryans jryans left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looking good to me with a few small things to fix mentioned inline. Thanks for working on this!

Since this touches on a few areas I am not as confident in, I think would be for one more person to also take a look.

}
}

if (CGDebugInfo *DI = CGM.getModuleDebugInfo())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As of #166202, there's now also a block a bit further down (just before the end, near line 6283) in this EmitCall function that tests for debug info. Maybe move your addition into that block?

The block I am referring to also happens to call the slightly different getDebugInfo() function, which checks whether this specific function has debug info enabled, which seems more correct. (It looks like function-level control of debug info is only changed for lambdas at the moment, so perhaps not a large difference, but good to be consistent I think.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving the change into that block it make sense, as they are related.

llvm::StringRef GetMethodLinkageName(const CXXMethodDecl *Method) const;

/// For each 'DISuprogram' we store a list of call instructions 'CallBase'
/// that indirectly call such 'DISuprogram'. We use its linkage name to
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// that indirectly call such 'DISuprogram'. We use its linkage name to
/// that indirectly call such 'DISuprogram'. We use its linkage name to

@tromey
Copy link
Contributor

tromey commented Nov 13, 2025

@tromey For some reasons, your name does not appear in the reviewers list. I would appreciate if you can add yourself as reviewer. Thanks

I don't have any permissions in llvm and so I think I can't be added as a reviewer.

Anyway, I didn't want your request to go unanswered, but I don't actually know that much about the code that this patch touches. My main question was why it was conditional on a particular debugger but that's been addressed.

As the extra call site information is available by default, update
other targets that support call site debug information.

Move the 'CallSiteInfo' set to the block introduced by:
  llvm#166202
@CarlosAlbertoEnciso
Copy link
Member Author

@jryans Thanks for your feedback. I have updated the patch to address your comments.

@CarlosAlbertoEnciso
Copy link
Member Author

Anyway, I didn't want your request to go unanswered, but I don't actually know that much about the code that this patch touches. My main question was why it was conditional on a particular debugger but that's been addressed.

Thanks for your reply. The patch has been updated to be general.

Copy link
Contributor

@OCHyams OCHyams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel that this is large enough that is would benefit reviewers to split into LLVM changes followed by Clang changes. If it's not too much work to reshuffle, I think that would be helpful, but I won't insist on it! I suppose doing so now risks losing comments already made.

Please can you explain the intended use case a bit more, both for others in future in the commit message, but also for me now as I'm a little confused -

In your example you said that:

int function(CBase *Input) {
  int Output = Input->foo(); // This call gets annotated with call_origin (CBase::foo)
  return Output;
}

And CBase::foo is virtual. DWARF says:

The call site entry may have a DW_AT_call_origin attribute which is a reference. For direct calls or jumps where the called subprogram is known it is a reference to the called subprogram’s debugging information entry. For indirect calls it may be a reference to a DW_TAG_variable, DW_TAG_formal_parameter or DW_TAG_member entry representing the subroutine pointer that is called.

Is it not bending the spec a little to have DW_AT_call_origin be a subprogram reference for a virtual call (which is indirect)? Possibly naively, would it make more sense for this this point to the global _vtable$ variable DIE? I might have completely the wrong end of the stick here.

@dwblaikie
Copy link
Collaborator

Yeah, seems problematic if the DWARF for a devirtualized or non-virtual call (derived->Base::func() for instance) is indistinguishable from a virtual call. Is that what's being proposed?

Anyone have suggestions on how we can avoid that ^ property? Ideally without having consumers need to know about something extra to detect this case (eg: putting an extra extension attribute on the call site die that says "this is a virtual call" is probably insufficient - because consumers that don't understand the extension would be left in the "unable to differentiate" case) - so maybe requires an extension attribute to describe the target?

@jmorse
Copy link
Member

jmorse commented Nov 20, 2025

Yeah, seems problematic if the DWARF for a devirtualized or non-virtual call (derived->Base::func() for instance) is indistinguishable from a virtual call. Is that what's being proposed?

The interpretation I've been taking here is that we're providing more information where there was an absence of it, rather than changing a meaning. As far as I'm aware, we haven't been distinguishing virtual/non-virtual call sites in the past, just not recording extra information about indirect calls. (85% confidence here -- I think we've been recording "the call target is this register" in call-site info, but nothing more?). This patch would be providing new information that's narrowly true about the source code, following the example that Orlando has pulled out, the source-code is a call to CBase::foo, but it also happens to be indirect due to it being a virtual call.

I suppose my question is -- have consumers really been using the absence of a subprogram reference as a proxy for "this is a virtual call", or can we get away with not adding another attribute? If there's a serious risk that consumers are doing that, an attribute is fine; presumably in the number space we should place it immediately after DW_AT_LLVM_alloc_type?

@jmorse
Copy link
Member

jmorse commented Nov 20, 2025

Hmmm, I see the wording in the spec nearby, "For indirect calls or jumps where it is unknown at compile time which subprogram will be called [...]". I suppose this shows that the thinking in the spec is that an indirect call is something where you don't know the specific subprogram; and that means an address; and that means consumers would legitimately assume if we put a subprogram reference in then there's a direct call. Which moves me towards thinking we would need an extra/new attribute to properly describe this.

@tromey
Copy link
Contributor

tromey commented Nov 23, 2025

Hmmm, I see the wording in the spec nearby, "For indirect calls or jumps where it is unknown at compile time which subprogram will be called [...]". I suppose this shows that the thinking in the spec is that an indirect call is something where you don't know the specific subprogram; and that means an address; and that means consumers would legitimately assume if we put a subprogram reference in then there's a direct call. Which moves me towards thinking we would need an extra/new attribute to properly describe this.

The full text of that paragraph is:

The call site may have a DW_AT_call_target attribute which is a DWARF
expression. For indirect calls or jumps where it is unknown at compile time
which subprogram will be called the expression computes the address of the
subprogram that will be called.

I think this means that a virtual call should be represented by a DW_AT_call_target with a DWARF expression that finds the call target, i.e., by finding the correct slot in the vtable.

A non-virtual call to a virtual method, like the devirtualized or upcall case, would instead use DW_AT_call_origin referencing the DIE for the specific method.

@CarlosAlbertoEnciso
Copy link
Member Author

First of all, thanks very much to all reviewers for the feedback, suggestions and clarifications.

@CarlosAlbertoEnciso
Copy link
Member Author

@OCHyams mentioned:

Is it not bending the spec a little to have DW_AT_call_origin be a subprogram reference for a virtual call (which is indirect)? Possibly naively, would it make more sense for this this point to the global _vtable$ variable DIE? I might have completely the wrong end of the stick here.

@tromey mentioned:

I think this means that a virtual call should be represented by a DW_AT_call_target with a DWARF expression that finds the call target, i.e., by finding the correct slot in the vtable.

From these comments, I am sensing that for virtual calls, instead of adding a new attribute, an alternative solution could be to use the DW_AT_call_target with an expression involving the global _vtable$ variable and extra information to find the correct slot?

@dwblaikie
Copy link
Collaborator

From these comments, I am sensing that for virtual calls, instead of adding a new attribute, an alternative solution could be to use the DW_AT_call_target with an expression involving the global _vtable$ variable and extra information to find the correct slot?

The expression wouldn't involve the global _vtable$ variables - that would only be possible if we had devirtualized the call (& so knew which vtable to use) - and if we had done that, we'd describe the direct call, no vtable needed.

In the non-devirtualized case, it'd be an expression involving the location of the variable, navigating that location to find the vtable pointer, then dereferincing that to find the vtable, then navigating that to find the specific vtable slot, etc.

Let's see if I can write an example:

struct base { virtual void f1(); };
void f1(base* b) {
  b->base::f1();
  b->f1();
}

Current clang-generated non-virtual DW_TAG_call_site

DW_TAG_call_site
  DW_AT_call_origin	("_ZN4base2f1Ev")
  DW_AT_call_return_pc	(XXXX)

  DW_TAG_call_site_parameter
    DW_AT_location	(DW_OP_reg5 RDI)
    DW_AT_call_value	(DW_OP_breg3 RBX+0)

Hypothetical virtual DW_TAG_call_site

DW_TAG_call_site
  DW_AT_call_target	(DW_OP_breg3 RBX+0, DW_OP_deref, DW_OP_deref)
  DW_AT_call_return_pc	(XXXX)

  DW_TAG_call_site_parameter
    DW_AT_location	(DW_OP_reg5 RDI)
    DW_AT_call_value	(DW_OP_breg3 RBX+0)

(in this case it's super easy (deref deref) because the vtable is at the start of the object (pretty much always true) and the function we're calling is at the start of the vtable (not always true - we'd need a DW_OP_add in there between the derefs) for non-initial vtable slots))

@CarlosAlbertoEnciso
Copy link
Member Author

@OCHyams I have added to the main description extra information for the intended use case.

@CarlosAlbertoEnciso
Copy link
Member Author

Given the test case:

class CBaseOne {
  virtual void foo(int &);
};

struct CDerivedOne : CBaseOne {
  void foo(int &);
};

void CDerivedOne::foo(int &) {
}

struct CBaseTwo {
  CDerivedOne *DerivedOne;
};

struct CDerivedTwo : CBaseTwo {
  void bar(int &);
};

void CDerivedTwo::bar(int &j) {
  DerivedOne->foo(j);
}

This is the generated DWARF

0x00000072:   DW_TAG_subprogram
                DW_AT_specification	(0x00000041 "_ZN11CDerivedOne3fooERi")
                ...

0x00000091:   DW_TAG_structure_type
                DW_AT_name	("CDerivedTwo")
                ...

0x0000009d:     DW_TAG_subprogram
                  DW_AT_linkage_name	("_ZN11CDerivedTwo3barERi")
                  DW_AT_name	("bar")

0x000000c8:   DW_TAG_subprogram
                ...
                DW_AT_specification	(0x0000009d "_ZN11CDerivedTwo3barERi")

0x000000ea:     DW_TAG_call_site
                  DW_AT_call_target	(DW_OP_reg0 RAX)
                  DW_AT_call_tail_call	(true)
                  DW_AT_call_pc	(0x0000000000000019)
                  DW_AT_call_origin	(0x00000072 "_ZN11CDerivedOne3fooERi")

@OCHyams
Copy link
Contributor

OCHyams commented Nov 25, 2025

Thanks @CarlosAlbertoEnciso, with the new commit description I think I get what this patch is aiming for.

To summarise, the SCE debugger wants to look at DW_AT_call_origin in parent call frames to determine "did the parent frame call this function in source code?". Answering "no" then indicates presence code folding or tail calls. And this proposed change adds coverage for that feature (because with the patch, that feature works for virtual calls too). Is that right @CarlosAlbertoEnciso?

I think my initial questions were going into the weeds trying to think about virtual call sites generally rather than what seems to be a targeted and specific use case.

I think this means that a virtual call should be represented by a DW_AT_call_target with a DWARF expression that finds the call target, i.e., by finding the correct slot in the vtable.

Having now read Carlos' new commit description I wonder if this is orthogonal to the proposed change. Especially as we do already emit DW_AT_call_target for indirect (including virtual) calls, it just uses a more direct location. Hypothetically, we could have both that call_target location change (which IIUC does not serves the SCE debugger use-case) and the DW_AT_call_origin change.

Yeah, seems problematic if the DWARF for a devirtualized or non-virtual call (derived->Base::func() for instance) is indistinguishable from a virtual call. Is that what's being proposed?

With the patch, IIUC, for a devirtualized or non-virtual call (to a virtual function) there would be no DW_AT_call_target because the call is direct. So that would be one way to disambiguate these cases (as the virtual call would have both DW_AT_call_target and DW_AT_call_origin).

@dwblaikie
Copy link
Collaborator

Sorry, yeah, I tihnk we did get a bit sidetracked/confused - maybe due to earlier descriptions, maybe due to being a bit context-free/looking only at an immediately preceding comment (in my case, i know I can have that problem - context switching between PRs/issues and not paging in the whole context again - sorry about that).

OK, so, sounds like we already do produce correct DWARF for an indirect call.

Problem is that DWARF isn't enough to identify ICF sites for indirect calls - because it doesn't record the originally intended destination of the call, only the actual call - so there's nothing to compare to to detect the difference.

I know this is way off in a different direction, but...

Have you considered using a NOP-sled at the start of ICF'd functions (this would be a linker change - when ICF'ing, rather than resolving every address to the ICF'd version - resolve each unique original function to a unique NOP at the start of/before the start of the chosen preserved copy - that way each original function would still have a unique address at the call sites)? This would solve any cases of indirect calls, not just virtual ones, and would work in a debugger, would preserve uniqueness of addresses for C++ function address comparison semantics, etc.

But, that aside: I'd still argue that while a consumer aware of this approach could know which attribute (between DW_OP_call_target and DW_OP_call_origin) is authoritative, the spec doesn't really give them enough info to go on & a consumer ignorant of this handshake could pick somewhat arbitrarily between the two and get confused.

It seems like a misuse of DW_OP_call_origin & I think it'd be more suitable to use a distinct attribute for this since it has different semantics from the standard attribute.

(but, yeah, I'd consider NOP sleds as a more general solution - they have some code size impact, but maybe it's small enough to keep them even in fully optimized builds)

@CarlosAlbertoEnciso
Copy link
Member Author

@dwblaikie Thanks very much for your great input.
After some discussions with our debugger team, adding a new attribute to eliminate the current ambiguity works well for them,
A DW_AT_LLVM_virtual_call_origin has been suggested.

- Added a new attribute: DW_AT_LLVM_virtual_call_origin
  The usage of 'DW_AT_call_Origin' could cause misinterpretations
  to any consumer relying on the DWARF specifications.
  The only place that takes into consideration the new attribute is
  the function that dumps an attribute (DWARFDie::dumpAttribute), to
  keep consistency in the generated output.

- Addressed issues with comments and test case 'RUN' lines position.
@CarlosAlbertoEnciso
Copy link
Member Author

A new patch has been uploaded.

The main change is the introduction of a new attribute DW_AT_LLVM_virtual_call_origin to address the ambiguity to any consumer due to the usage of DW_AT_call_origin.

The only place that takes into consideration the new attribute is the function that dumps an attribute (DWARFDie::dumpAttribute), to keep consistency in the generated output.

@dwblaikie
Copy link
Collaborator

@dwblaikie Thanks very much for your great input. After some discussions with our debugger team, adding a new attribute to eliminate the current ambiguity works well for them, A DW_AT_LLVM_virtual_call_origin has been suggested.

Any thoughts on the NOP-sled as a way to get unique addresses?

@OCHyams
Copy link
Contributor

OCHyams commented Dec 4, 2025

Carlos is away at the moment so I'm happy to take the nop-sled question. Sorry for the slight wait, I've also been in and out of the office this week.

Problem is that DWARF isn't enough to identify ICF sites for indirect calls - because it doesn't record the originally intended destination of the call, only the actual call - so there's nothing to compare to to detect the difference.

That makes sense. However, I think there's more than just ICF detection at play here as I believe it's also the ability to disambiguate between tail calls and ICF that is valuable. The SCE debugger recovers entry values using call site parameter DIEs, so it's important to know whether or not there has been a tail call between that call site info (in the next frame up/parent frame) and the current call frame, as that would mean the call site information does not describe the call to the current frame.

Where there is a mismatch between the parent frame call_origin and current frame it could be due to either ICF or a tail call. Today, we can disambiguate those cases for direct calls using the call_origin as already mentioned.

However, I don't think there's a foolproof way to determine whether there's been a tail call for indirect calls currently. Right now the debugger defensively hides the entry_values if the parent frame doesn't have a call_origin at the call site (e.g., because it's indirect). Adding virutal_call_origin for virtual calls lets us perform that check over some indirect call sites, albeit imperfectly (a tail call through to another implementation of the virtual function would be missed, I think).

I know this is way off in a different direction, but...

Have you considered using a NOP-sled at the start of ICF'd functions (this would be a linker change - when ICF'ing, rather than resolving every address to the ICF'd version - resolve each unique original function to a unique NOP at the start of/before the start of the chosen preserved copy - that way each original function would still have a unique address at the call sites)? This would solve any cases of indirect calls, not just virtual ones, and would work in a debugger, would preserve uniqueness of addresses for C++ function address comparison semantics, etc.

That sounds interesting, I had not seen this idea before. I wonder, does this only help you detect ICF specifically on the first step into a call? I can't picture how this would work if you paused execution in the middle of the code-folded function.

I'm not sure it helps us with the tail call detection for entry value resolution for indirect calls. I appreciate that isn't what you were suggesting or responding to, but I think we'd still like to solve that.

But, that aside: I'd still argue that while a consumer aware of this approach could know which attribute (between DW_OP_call_target and DW_OP_call_origin) is authoritative, the spec doesn't really give them enough info to go on & a consumer ignorant of this handshake could pick somewhat arbitrarily between the two and get confused.

It seems like a misuse of DW_OP_call_origin & I think it'd be more suitable to use a distinct attribute for this since it has different semantics from the standard attribute.

That seems fair to me, and it looks like Carlos has taken this on board and added a new attribute.

(but, yeah, I'd consider NOP sleds as a more general solution - they have some code size impact, but maybe it's small enough to keep them even in fully optimized builds)

Some of our users are quite constrained by code size (and it seems reasonable to assume it's likely to be those users that are using ICF), so that could potentially be an issue. Are there performance implications? Assuming that the call sites also get patched to the unique nops (otherwise the linker needs to potentially do special things for debug info, which sounds like it's not ideal, unless I'm not seeing the full picture?).

(cc @jmorse in case I've missed anything or you'd like to jump in etc)

@dwblaikie
Copy link
Collaborator

Carlos is away at the moment so I'm happy to take the nop-sled question. Sorry for the slight wait, I've also been in and out of the office this week.

Problem is that DWARF isn't enough to identify ICF sites for indirect calls - because it doesn't record the originally intended destination of the call, only the actual call - so there's nothing to compare to to detect the difference.

That makes sense. However, I think there's more than just ICF detection at play here as I believe it's also the ability to disambiguate between tail calls and ICF that is valuable. The SCE debugger recovers entry values using call site parameter DIEs, so it's important to know whether or not there has been a tail call between that call site info (in the next frame up/parent frame) and the current call frame, as that would mean the call site information does not describe the call to the current frame.

Where there is a mismatch between the parent frame call_origin and current frame it could be due to either ICF or a tail call. Today, we can disambiguate those cases for direct calls using the call_origin as already mentioned.

However, I don't think there's a foolproof way to determine whether there's been a tail call for indirect calls currently. Right now the debugger defensively hides the entry_values if the parent frame doesn't have a call_origin at the call site (e.g., because it's indirect). Adding virutal_call_origin for virtual calls lets us perform that check over some indirect call sites, albeit imperfectly (a tail call through to another implementation of the virtual function would be missed, I think).

tail calls have DW_AT_call_tail_call on them, right?

I know this is way off in a different direction, but...

Have you considered using a NOP-sled at the start of ICF'd functions (this would be a linker change - when ICF'ing, rather than resolving every address to the ICF'd version - resolve each unique original function to a unique NOP at the start of/before the start of the chosen preserved copy - that way each original function would still have a unique address at the call sites)? This would solve any cases of indirect calls, not just virtual ones, and would work in a debugger, would preserve uniqueness of addresses for C++ function address comparison semantics, etc.

That sounds interesting, I had not seen this idea before. I wonder, does this only help you detect ICF specifically on the first step into a call? I can't picture how this would work if you paused execution in the middle of the code-folded function.

So you already do this for statically known call sites by comparing the function you're in to the caller's call_site's target function - if they're different, it's ICF (and/or a tail call - though the caller's call site should tell you if it's a tail call, so you can know if you're ICF-that's-not-a-tail-call, but if it's a tail call you could be in tail call or tail call+ICF)

With a NOP sled you could do this for dynamic calls too - because you could differentiate between the start of the function you're in and the caller's call_site's call_target, if you can still evaluate that (if registers haven't been clobbered/not saved/etc). Which you couldn't do without the nop sled, because the caller's call_target would already be resolved to point to the ICF'd function.

I'm not sure it helps us with the tail call detection for entry value resolution for indirect calls. I appreciate that isn't what you were suggesting or responding to, but I think we'd still like to solve that.

Rather than trying to detect tail calls - can't you rely the call_site to tell you if it's a tail call? The compiler knows and should be able to emit this information reliably (& seems to at least in some cases - if there's gaps, hopefully those are fixable)

& yeah, if you see a tail call - if you can get to the function the tail call is for (even if it's a dynamic dispatch - you can then lookup the function it's dispatching to in the DWARF because you know at runtime/post-runtime what it is) and if it isn't the callee, don't use the call_site_parameters, and if it is the callee you have to analyze all that direct call's calls (recursively), and up from the callee, until you have a graph from caller to callee following only tail call edges and check that the callee appears nowhere else in that graph (eg: in the simple case if the callee is tail recursive - you don't know whether you're in the first call or the Nth call, so you can't reliably use the call site parameters).

It seems like a misuse of DW_OP_call_origin & I think it'd be more suitable to use a distinct attribute for this since it has different semantics from the standard attribute.

That seems fair to me, and it looks like Carlos has taken this on board and added a new attribute.

Yep yep - wondering if we can avoid that/have some more general solution that solves some other problems too.

(but, yeah, I'd consider NOP sleds as a more general solution - they have some code size impact, but maybe it's small enough to keep them even in fully optimized builds)

Some of our users are quite constrained by code size (and it seems reasonable to assume it's likely to be those users that are using ICF), so that could potentially be an issue. Are there performance implications? Assuming that the call sites also get patched to the unique nops (otherwise the linker needs to potentially do special things for debug info, which sounds like it's not ideal, unless I'm not seeing the full picture?).

Yep, it'd have non-zero size and performance implications - I don't know how small the epsilon is & whether it's below noise/below some acceptable cost threshold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants