-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[RFC] Emit dwarf data for signature-changed or new functions #157349
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-pgo @llvm/pr-subscribers-debuginfo Author: None (yonghong-song) ChangesAdd a new pass EmitChangedFuncDebugInfo which will add dwarf for The previous approach in [1] tries to add debuginfo for those The ultimate goal is to add new information to dwarf like below:
The parent tag of above DW_TAG_inlined_subroutine is A special CompileUnit with file name "<artificial>" is created The below are some discussions with not handled cases and I have tested this patch set by building latest bpf-next linux kernel.
For thin-lto case:
The following are some examples with thinlto with generated dwarf:
[1] #127855 Patch is 25.17 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/157349.diff 14 Files Affected:
diff --git a/llvm/include/llvm/Transforms/Utils/EmitChangedFuncDebugInfo.h b/llvm/include/llvm/Transforms/Utils/EmitChangedFuncDebugInfo.h
new file mode 100644
index 0000000000000..8d569cd95d7f7
--- /dev/null
+++ b/llvm/include/llvm/Transforms/Utils/EmitChangedFuncDebugInfo.h
@@ -0,0 +1,33 @@
+//===- EmitChangedFuncDebugInfo.h - Emit Additional Debug Info -*- C++ --*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+/// \file
+/// Emit debug info for changed or new funcs.
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_UTILS_EMITCHANGEDFUNCDEBUGINFO_H
+#define LLVM_TRANSFORMS_UTILS_EMITCHANGEDFUNCDEBUGINFO_H
+
+#include "llvm/IR/PassManager.h"
+
+namespace llvm {
+
+class Module;
+
+// Pass that emits late dwarf.
+class EmitChangedFuncDebugInfoPass
+ : public PassInfoMixin<EmitChangedFuncDebugInfoPass> {
+public:
+ EmitChangedFuncDebugInfoPass() = default;
+
+ PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_EMITCHANGEDFUNCDEBUGINFO_H
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
index c27f100775625..3245d486feb77 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
@@ -1266,11 +1266,83 @@ void DwarfDebug::finishSubprogramDefinitions() {
}
}
+void DwarfDebug::addChangedSubprograms() {
+ // Generate additional dwarf for functions with signature changed.
+ NamedMDNode *NMD = MMI->getModule()->getNamedMetadata("llvm.dbg.cu");
+ DICompileUnit *ExtraCU = nullptr;
+ for (MDNode *N : NMD->operands()) {
+ auto *CU = cast<DICompileUnit>(N);
+ if (CU->getFile()->getFilename() == "<artificial>") {
+ ExtraCU = CU;
+ break;
+ }
+ }
+ if (!ExtraCU)
+ return;
+
+ llvm::DebugInfoFinder DIF;
+ DIF.processModule(*MMI->getModule());
+ for (auto *ExtraSP : DIF.subprograms()) {
+ if (ExtraSP->getUnit() != ExtraCU)
+ continue;
+
+ DISubprogram *SP = cast<DISubprogram>(ExtraSP->getScope());
+ DwarfCompileUnit &Cu = getOrCreateDwarfCompileUnit(SP->getUnit());
+ DIE *ScopeDIE =
+ DIE::get(DIEValueAllocator, dwarf::DW_TAG_inlined_subroutine);
+ Cu.getUnitDie().addChild(ScopeDIE);
+
+ Cu.addString(*ScopeDIE, dwarf::DW_AT_name, ExtraSP->getName());
+
+ DITypeRefArray Args = ExtraSP->getType()->getTypeArray();
+
+ if (Args[0])
+ Cu.addType(*ScopeDIE, Args[0]);
+
+ if (ExtraSP->getType()->getCC() == llvm::dwarf::DW_CC_nocall) {
+ Cu.addUInt(*ScopeDIE, dwarf::DW_AT_calling_convention,
+ dwarf::DW_FORM_data1, llvm::dwarf::DW_CC_nocall);
+ }
+
+ Cu.addFlag(*ScopeDIE, dwarf::DW_AT_artificial);
+
+ // dereference the DIE* for DIEEntry
+ DIE *OriginDIE = Cu.getOrCreateSubprogramDIE(SP);
+ Cu.addDIEEntry(*ScopeDIE, dwarf::DW_AT_specification, DIEEntry(*OriginDIE));
+
+ SmallVector<const DILocalVariable *> ArgVars(Args.size());
+ for (const DINode *DN : ExtraSP->getRetainedNodes()) {
+ if (const auto *DV = dyn_cast<DILocalVariable>(DN)) {
+ uint32_t Arg = DV->getArg();
+ if (Arg)
+ ArgVars[Arg - 1] = DV;
+ }
+ }
+
+ for (unsigned i = 1, N = Args.size(); i < N; ++i) {
+ const DIType *Ty = Args[i];
+ if (!Ty) {
+ assert(i == N-1 && "Unspecified parameter must be the last argument");
+ Cu.createAndAddDIE(dwarf::DW_TAG_unspecified_parameters, *ScopeDIE);
+ } else {
+ DIE &Arg =
+ Cu.createAndAddDIE(dwarf::DW_TAG_formal_parameter, *ScopeDIE);
+ const DILocalVariable *DV = ArgVars[i - 1];
+ if (DV)
+ Cu.addString(Arg, dwarf::DW_AT_name, DV->getName());
+ Cu.addType(Arg, Ty);
+ }
+ }
+ }
+}
+
void DwarfDebug::finalizeModuleInfo() {
const TargetLoweringObjectFile &TLOF = Asm->getObjFileLowering();
finishSubprogramDefinitions();
+ addChangedSubprograms();
+
finishEntityDefinitions();
bool HasEmittedSplitCU = false;
diff --git a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
index 89813dcf0fdab..417ffb19633c3 100644
--- a/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
+++ b/llvm/lib/CodeGen/AsmPrinter/DwarfDebug.h
@@ -565,6 +565,8 @@ class DwarfDebug : public DebugHandlerBase {
void finishSubprogramDefinitions();
+ void addChangedSubprograms();
+
/// Finish off debug information after all functions have been
/// processed.
void finalizeModuleInfo();
diff --git a/llvm/lib/Passes/PassBuilder.cpp b/llvm/lib/Passes/PassBuilder.cpp
index 587f0ece0859b..fa937a9a317be 100644
--- a/llvm/lib/Passes/PassBuilder.cpp
+++ b/llvm/lib/Passes/PassBuilder.cpp
@@ -344,6 +344,7 @@
#include "llvm/Transforms/Utils/CanonicalizeAliases.h"
#include "llvm/Transforms/Utils/CanonicalizeFreezeInLoops.h"
#include "llvm/Transforms/Utils/CountVisits.h"
+#include "llvm/Transforms/Utils/EmitChangedFuncDebugInfo.h"
#include "llvm/Transforms/Utils/DXILUpgrade.h"
#include "llvm/Transforms/Utils/Debugify.h"
#include "llvm/Transforms/Utils/DeclareRuntimeLibcalls.h"
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 98821bb1408a7..123041ea8cad8 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -133,6 +133,7 @@
#include "llvm/Transforms/Utils/AssumeBundleBuilder.h"
#include "llvm/Transforms/Utils/CanonicalizeAliases.h"
#include "llvm/Transforms/Utils/CountVisits.h"
+#include "llvm/Transforms/Utils/EmitChangedFuncDebugInfo.h"
#include "llvm/Transforms/Utils/EntryExitInstrumenter.h"
#include "llvm/Transforms/Utils/ExtraPassManager.h"
#include "llvm/Transforms/Utils/InjectTLIMappings.h"
@@ -1625,9 +1626,12 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
if (PTO.CallGraphProfile && !LTOPreLink)
MPM.addPass(CGProfilePass(isLTOPostLink(LTOPhase)));
- // RelLookupTableConverterPass runs later in LTO post-link pipeline.
- if (!LTOPreLink)
+ // RelLookupTableConverterPass and EmitChangedFuncDebugInfoPass run later in
+ // LTO post-link pipeline.
+ if (!LTOPreLink) {
MPM.addPass(RelLookupTableConverterPass());
+ MPM.addPass(EmitChangedFuncDebugInfoPass());
+ }
return MPM;
}
@@ -2355,4 +2359,4 @@ AAManager PassBuilder::buildDefaultAAPipeline() {
bool PassBuilder::isInstrumentedPGOUse() const {
return (PGOOpt && PGOOpt->Action == PGOOptions::IRUse) ||
!UseCtxProfile.empty();
-}
\ No newline at end of file
+}
diff --git a/llvm/lib/Passes/PassRegistry.def b/llvm/lib/Passes/PassRegistry.def
index 299aaa801439b..78ee4ca6f96a1 100644
--- a/llvm/lib/Passes/PassRegistry.def
+++ b/llvm/lib/Passes/PassRegistry.def
@@ -73,6 +73,7 @@ MODULE_PASS("debugify", NewPMDebugifyPass())
MODULE_PASS("declare-runtime-libcalls", DeclareRuntimeLibcallsPass())
MODULE_PASS("dfsan", DataFlowSanitizerPass())
MODULE_PASS("dot-callgraph", CallGraphDOTPrinterPass())
+MODULE_PASS("dwarf-emit-late", EmitChangedFuncDebugInfoPass())
MODULE_PASS("dxil-upgrade", DXILUpgradePass())
MODULE_PASS("elim-avail-extern", EliminateAvailableExternallyPass())
MODULE_PASS("extract-blocks", BlockExtractorPass({}, false))
diff --git a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
index 262c902d40d2d..609e4f8e4d23a 100644
--- a/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
+++ b/llvm/lib/Transforms/IPO/ArgumentPromotion.cpp
@@ -50,6 +50,7 @@
#include "llvm/IR/BasicBlock.h"
#include "llvm/IR/CFG.h"
#include "llvm/IR/Constants.h"
+#include "llvm/IR/DIBuilder.h"
#include "llvm/IR/DataLayout.h"
#include "llvm/IR/DerivedTypes.h"
#include "llvm/IR/Dominators.h"
@@ -432,6 +433,14 @@ doPromotion(Function *F, FunctionAnalysisManager &FAM,
PromoteMemToReg(Allocas, DT, &AC);
}
+ // DW_CC_nocall to DISubroutineType to inform debugger that it may not be safe
+ // to call this function.
+ DISubprogram *SP = NF->getSubprogram();
+ if (SP) {
+ auto Temp = SP->getType()->cloneWithCC(llvm::dwarf::DW_CC_nocall);
+ SP->replaceType(MDNode::replaceWithPermanent(std::move(Temp)));
+ }
+
return NF;
}
diff --git a/llvm/lib/Transforms/Utils/CMakeLists.txt b/llvm/lib/Transforms/Utils/CMakeLists.txt
index e411d68570096..0b36693ce7975 100644
--- a/llvm/lib/Transforms/Utils/CMakeLists.txt
+++ b/llvm/lib/Transforms/Utils/CMakeLists.txt
@@ -22,6 +22,7 @@ add_llvm_component_library(LLVMTransformUtils
Debugify.cpp
DeclareRuntimeLibcalls.cpp
DemoteRegToStack.cpp
+ EmitChangedFuncDebugInfo.cpp
DXILUpgrade.cpp
EntryExitInstrumenter.cpp
EscapeEnumerator.cpp
diff --git a/llvm/lib/Transforms/Utils/EmitChangedFuncDebugInfo.cpp b/llvm/lib/Transforms/Utils/EmitChangedFuncDebugInfo.cpp
new file mode 100644
index 0000000000000..82acae3f0efeb
--- /dev/null
+++ b/llvm/lib/Transforms/Utils/EmitChangedFuncDebugInfo.cpp
@@ -0,0 +1,337 @@
+//==- EmitChangedFuncDebugInfoPass - Emit Additional Debug Info -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements emitting debug info for functions with changed
+// signatures or new functions.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Transforms/Utils/EmitChangedFuncDebugInfo.h"
+#include "llvm/IR/DIBuilder.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/IntrinsicInst.h"
+#include "llvm/IR/Module.h"
+
+using namespace llvm;
+
+static bool getArg(BasicBlock &FirstBB, unsigned Idx, DIBuilder &DIB,
+ DIFile *NewFile, Function *F, DISubprogram *OldSP,
+ SmallVector<Metadata *, 5> &TypeList,
+ SmallVector<Metadata *, 5> &ArgList) {
+ for (Instruction &I : FirstBB) {
+ for (const DbgRecord &DR : I.getDbgRecordRange()) {
+ auto *DVR = dyn_cast<DbgVariableRecord>(&DR);
+ if (!DVR)
+ continue;
+ // All of DbgVariableRecord::LocationType::{Value,Assign,Declare}
+ // are covered.
+ Metadata *Loc = DVR->getRawLocation();
+ auto *ValueMDN = dyn_cast<ValueAsMetadata>(Loc);
+ if (!ValueMDN)
+ continue;
+
+ // A poison value may correspond to a unused argument.
+ if (isa<PoisonValue>(ValueMDN->getValue())) {
+ Type *Ty = ValueMDN->getType();
+ auto *Var = cast<DILocalVariable>(DVR->getRawVariable());
+ if (!Var || Var->getArg() != (Idx + 1))
+ continue;
+
+ // Check for cases like below due to ArgumentPromotion
+ // define internal ... i32 @add42_byref(i32 %p.0.val) ... {
+ // #dbg_value(ptr poison, !17, !DIExpression(), !18)
+ // ...
+ // }
+ // TODO: one pointer expands to more than one argument is not
+ // supported yet. For example,
+ // define internal ... i32 @add42_byref(i32 %p.0.val, i32 %p.4.val)
+ // ...
+ if (Ty->isPointerTy() && F->getArg(Idx)->getType()->isIntegerTy()) {
+ // For such cases, a new argument is created.
+ auto *IntTy = cast<IntegerType>(F->getArg(Idx)->getType());
+ unsigned IntBitWidth = IntTy->getBitWidth();
+
+ DIType *IntDIType =
+ DIB.createBasicType("int" + std::to_string(IntBitWidth),
+ IntBitWidth, dwarf::DW_ATE_signed);
+ Var = DIB.createParameterVariable(OldSP, F->getArg(Idx)->getName(),
+ Idx + 1, NewFile, OldSP->getLine(),
+ IntDIType);
+ }
+
+ TypeList.push_back(Var->getType());
+ ArgList.push_back(Var);
+ return true;
+ }
+
+ // Handle the following pattern:
+ // ... @vgacon_do_font_op(..., i32 noundef, i1 noundef zeroext %ch512)
+ // ... {
+ // ...
+ // #dbg_value(i32 %set, !8568, !DIExpression(), !8589)
+ // %storedv = zext i1 %ch512 to i8
+ // #dbg_value(i8 %storedv, !8569, !DIExpression(), !8589)
+ // ...
+ // }
+ if (ValueMDN->getValue() != F->getArg(Idx)) {
+ Instruction *PrevI = I.getPrevNode();
+ if (!PrevI)
+ continue;
+ if (ValueMDN->getValue() != PrevI)
+ continue;
+ auto *ZExt = dyn_cast<ZExtInst>(PrevI);
+ if (!ZExt)
+ continue;
+ if (ZExt->getOperand(0) != F->getArg(Idx))
+ continue;
+ }
+
+ auto *Var = cast<DILocalVariable>(DVR->getRawVariable());
+
+ // Even we get dbg_*(...) for arguments, we still need to ensure
+ // compatible types between IR func argument types and debugInfo argument
+ // types.
+ Type *Ty = ValueMDN->getType();
+ DIType *DITy = Var->getType();
+ while (auto *DTy = dyn_cast<DIDerivedType>(DITy)) {
+ if (DTy->getTag() == dwarf::DW_TAG_pointer_type) {
+ DITy = DTy;
+ break;
+ }
+ DITy = DTy->getBaseType();
+ }
+
+ if (Ty->isIntegerTy()) {
+ if (auto *DTy = dyn_cast<DICompositeType>(DITy)) {
+ if (!Ty->isIntegerTy(DTy->getSizeInBits())) {
+ // TODO: A struct param breaks into two actual arguments like
+ // static int count(struct user_arg_ptr argv, int max)
+ // and the actual func signature:
+ // i32 @count(i8 range(i8 0, 2) %argv.coerce0, ptr %argv.coerce1)
+ // {
+ // #dbg_value(i8 %argv.coerce0, !14759,
+ // !DIExpression(DW_OP_LLVM_fragment, 0, 8), !14768)
+ // #dbg_value(ptr %argv.coerce1, !14759,
+ // !DIExpression(DW_OP_LLVM_fragment, 64, 64), !14768)
+ // ...
+ // }
+ return false;
+ }
+ }
+ } else if (Ty->isPointerTy()) {
+ // TODO: A struct turned into a pointer to struct.
+ // @rhashtable_lookup_fast(ptr noundef %key,
+ // ptr noundef readonly byval(%struct.rhashtable_params)
+ // align 8 captures(none) %params) {
+ // ...
+ // %MyAlloca = alloca [160 x i8], align 32
+ // %0 = ptrtoint ptr %MyAlloca to i64
+ // %1 = add i64 %0, 32
+ // %2 = inttoptr i64 %1 to ptr
+ // ...
+ // call void @llvm.memcpy.p0.p0.i64(ptr align 8 %2, ptr align 8
+ // %params, i64 40, i1 false)
+ // #dbg_value(ptr @offdevs, !15308, !DIExpression(), !15312)
+ // #dbg_value(ptr %key, !15309, !DIExpression(), !15312)
+ // #dbg_declare(ptr %MyAlloca, !15310,
+ // !DIExpression(DW_OP_plus_uconst, 32), !15313)
+ // tail call void @__rcu_read_lock() #14, !dbg !15314
+ // }
+ if (dyn_cast<DICompositeType>(DITy))
+ return false;
+
+ auto *DTy = dyn_cast<DIDerivedType>(DITy);
+ if (!DTy)
+ continue;
+ if (DTy->getTag() != dwarf::DW_TAG_pointer_type)
+ continue;
+ }
+
+ TypeList.push_back(Var->getType());
+ if (Var->getArg() != (Idx + 1) ||
+ Var->getName() != F->getArg(Idx)->getName()) {
+ Var = DIB.createParameterVariable(OldSP, F->getArg(Idx)->getName(),
+ Idx + 1, OldSP->getUnit()->getFile(),
+ OldSP->getLine(), Var->getType());
+ }
+ ArgList.push_back(Var);
+ return true;
+ }
+ }
+
+ return false;
+}
+
+static bool getTypeArgList(DIBuilder &DIB, DIFile *NewFile, Function *F,
+ FunctionType *FTy, DISubprogram *OldSP,
+ SmallVector<Metadata *, 5> &TypeList,
+ SmallVector<Metadata *, 5> &ArgList) {
+ Type *RetTy = FTy->getReturnType();
+ if (RetTy->isVoidTy()) {
+ // Void return type may be due to optimization.
+ TypeList.push_back(nullptr);
+ } else {
+ // Optimization does not change return type from one
+ // non-void type to another non-void type.
+ DITypeRefArray TyArray = OldSP->getType()->getTypeArray();
+ TypeList.push_back(TyArray[0]);
+ }
+
+ unsigned NumArgs = FTy->getNumParams();
+ BasicBlock &FirstBB = F->getEntryBlock();
+ for (unsigned i = 0; i < NumArgs; ++i) {
+ if (!getArg(FirstBB, i, DIB, NewFile, F, OldSP, TypeList, ArgList))
+ return false;
+ }
+
+ return true;
+}
+
+static void generateDebugInfo(Module &M, Function *F) {
+ // For this CU, we want generate the following three dwarf units:
+ // DW_TAG_compile_unit
+ // ...
+ // // New functions with suffix
+ // DW_TAG_inlined_subroutine
+ // DW_AT_name ("foo.1")
+ // DW_AT_type (0x0000000000000091 "int")
+ // DW_AT_artificial (true)
+ // DW_AT_specificiation (original DW_TAG_subprogram)
+ //
+ // DW_TAG_formal_parameter
+ // DW_AT_name ("b")
+ // DW_AT_type (0x0000000000000091 "int")
+ //
+ // DW_TAG_formal_parameter
+ // DW_AT_name ("c")
+ // DW_AT_type (0x0000000000000095 "long")
+ // ...
+ // // Functions with changed signatures
+ // DW_TAG_inlined_subroutine
+ // DW_AT_name ("bar")
+ // DW_AT_type (0x0000000000000091 "int")
+ // DW_AT_artificial (true)
+ // DW_AT_specificiation (original DW_TAG_subprogram)
+ //
+ // DW_TAG_formal_parameter
+ // DW_AT_name ("c")
+ // DW_AT_type (0x0000000000000095 "unsigned int")
+ // ...
+ // // Functions not obtained function changed signatures yet
+ // // The DW_CC_nocall presence indicates such cases.
+ // DW_TAG_inlined_subroutine
+ // DW_AT_name ("bar" or "bar.1")
+ // DW_AT_calling_convention (DW_CC_nocall)
+ // DW_AT_artificial (true)
+ // DW_AT_specificiation (original DW_TAG_subprogram)
+ // ...
+
+ // A new ComputeUnit is created with file name "<artificial>"
+ // to host newly-created DISubprogram's.
+ DICompileUnit *NewCU = nullptr;
+ NamedMDNode *CUs = M.getNamedMetadata("llvm.dbg.cu");
+ // Check whether the expected CU already there or not.
+ for (MDNode *Node : CUs->operands()) {
+ auto *CU = cast<DICompileUnit>(Node);
+ if (CU->getFile()->getFilename() == "<artificial>") {
+ NewCU = CU;
+ break;
+ }
+ }
+
+ DISubprogram *OldSP = F->getSubprogram();
+ DIBuilder DIB(M, /*AllowUnresolved=*/false, NewCU);
+ DIFile *NewFile;
+
+ if (NewCU) {
+ NewFile = NewCU->getFile();
+ } else {
+ DICompileUnit *OldCU = OldSP->getUnit();
+ DIFile *OldFile = OldCU->getFile();
+ NewFile = DIB.createFile("<artificial>", OldFile->getDirectory());
+ NewCU = DIB.createCompileUnit(
+ OldCU->getSourceLanguage(), NewFile, OldCU->getProducer(),
+ OldCU->isOptimized(), OldCU->getFlags(), OldCU->getRuntimeVersion());
+ }
+
+ SmallVector<Metadata *, 5> TypeList;
+ SmallVector<Metadata *, 5> ArgList;
+
+ FunctionType *FTy = F->getFunctionType();
+ bool Success = getTypeArgList(DIB, NewFile, F, FTy, OldSP, TypeList, ArgList);
+ if (!Success) {
+ TypeList.clear();
+ TypeList.push_back(nullptr);
+ ArgList.clear();
+ }
+
+ DITypeRefArray DITypeArray = DIB.getOrCreateTypeArray(TypeList);
+ auto *SubroutineType = DIB.createSubroutineType(DITypeArray);
+ DINodeArray ArgArray = DIB.getOrCreateArray(ArgList);
+
+ Function *DummyF =
+ Function::Create(FTy, GlobalValue::AvailableExternallyLinkage,
+ F->getName() + ".newsig", &M);
+
+ DISubprogram *NewSP =
+ DIB.createFunction(OldSP, // Scope
+ F->getName(), // Name
+ OldSP->getLinkageName(), // Linkage name
+ NewFile, // File
+ OldSP->getLine(), // Line
+ SubroutineType, // DISubroutineType
+ OldSP->getScopeLine(), // ScopeLine
+ ...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
cc @jemarch |
a864cf6
to
aea80d2
Compare
c9f140c
to
c995ada
Compare
@@ -1266,11 +1266,83 @@ void DwarfDebug::finishSubprogramDefinitions() { | |||
} | |||
} | |||
|
|||
void DwarfDebug::addChangedSubprograms() { | |||
// Generate additional dwarf for functions with signature changed. | |||
NamedMDNode *NMD = MMI->getModule()->getNamedMetadata("llvm.dbg.cu"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module::debug_compile_units
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module::debug_compile_units
Thanks! Will do as you suggested.
Let's illustrate that with an example.
And some LLVM passes transform it into something like:
Then, EmitChangedFuncDebugInfo should transform debug info metadata into something like:
Which should cause the following DWARF to be produced:
Would that work to solve the original issue? |
Thanks! So the idea is to the actual function itself has tag for the original subprogram, but the the code in the actual function has tags for the modified subprogram. This may work. Will give a try. |
To avoid misunderstanding, in my example, |
@dzhidzhoev For the following dwarf:
If the tool tries to find the original signature based on
the tool may get incorrect result.
WDYT? Could your proposed dwarf solution break some tools? Or this is something we can tolerate? |
Thank you for checking that! Yes, having the exapmle, I understand better, what we're trying to achieve.
With LLVM IR produced by the EmitChangedFuncDebugInfo pass like this:
Will the result meet the conditions: Btw, I believe this is what the colleagues proposed in the course of the earlier discussion on #127855:
We do not introduce any new LLVM IR functions (e.g. |
For
For C language, the linkage_name will likely be the same function name.
This will introduce new functions in symbol table. We will have to remove these functions at the end before final emitting the code. Otherwise users will be really confusing about why these new functions. Also, foo(1, {a, b}) may be inlined. |
What do you mean? From my understanding, this is not the code that should be produced in LLVM IR/MIR/assembly. This is just an illustration of how DWARF should look like.
|
Okay, indeed. mul.clone is not actually in IR. Just a conceptual idea from implementation point of view. I will try to generate the following dwarf:
Here, the top DW_TAG_subprogram will have final function types instead of the original one. The original one will be encoded in DW_TAG_inlined_subroutine. |
@dzhidzhoev I roughly finished a version which can roughly generate dwarf like
But building linux kernel caused failure with 'pahole' where 'pahole' intends to generate BTF types. The error message from pahole:
From pahole perspective, it founds two functions which has identical name 'bpf_lsm_socket_getpeersec_stream' with different number of parameters. Note that in the above, there are two DW_TAG_subprogram with the same name and at the same level. So in my opinion, having two DW_TAG_subprogram's at the same level with identical func name probably will break some tools. WDYT? |
I don't think this is a problem, especially when functions have different signatures. For example, DWARF in [0] has two DW_TAG_subprograms at the same level with the same DW_AT_names, and I think the majority of tools will easily process it.
Are you sure that the tool matches functions based on the name in DWARF? I'm not familiar with pahole, but briefly looking at it, I assume that the error is caused by having two ELF symbols with the same name replesenting functions with different signatures (in [1] [2] functions are matched based on [0] https://godbolt.org/z/9cW6xsr4z |
Okay, I think I found the pahole issue which prevents linux kernel build.
Basically if the 'fn' is inlined, we should not add this 'fn' into the elf function list. For this particular func, my unpublished RFC code generates the following dwarf (from vmlinux.o):
Could you help check whether the above format is expected or not? Running bpf selftests, I found additional failures. Mostly likely additional pahole change is needed. I am debugging those now. |
Yes, it looks kind of what I expected. Hope it solves the original problem and doesn't create the new ones. |
A quick note on that. I remembered the message from the discussion in #127855:
I think, having two functions with the same DW_AT_name but different formal parameters may indeed interfere with debugger name resolution: debugger users will be able to call the function with the signature not matching the signature from the source code. |
15c7a0e
to
d33661d
Compare
Just uploaded a new revision which used DW_TAG_inlined_subroutine to represent the function with original signature. The DW_TAG_inlined_subroutine will be the children of the DW_TAG_subprogram. With pahole change
Linux kernel can be built successfully and I am running bpf selftest. There are only a couple of selftest failures due to the kernel/module func signature change which can be resolved by modifying the code with proper actual signature. |
Could you please add more tests for that?
I think it will be easier to review this if we have examples of LLVM IR input + expected LLVM IR or DWARF output in form of tests. |
Thanks for the suggestions. Among other things I mentioned in the description, I will add test cases as well in the next revision. |
Hi, I've read over this thread but haven't managed to build a full picture of what's being introduced here. I see "RFC" in the PR title - has this change been discussed on discourse? It might be worth doing that if you haven't, to raise awareness, if this is going to affect many debug info consumers. As far as I understand it from the thread, this change intends to add an additional DWARF inline scope to functions with changed signatures, where the inline scope has the original signature. Is that right? If so, isn't that going to cause debuggers to spuriously show an extra "inline" frame in the call stack?
|
d33661d
to
e8844f5
Compare
I have an old discourse thread. I just made some update on it. See Initial format at (#127855) will not change debugger. But it seems not elegant from dwarf perspective. So the new mechanism is to directly encode actual signature in debuginfo/dwarf and use inlinedAt to refer to the original signature.
Yes. I am not familiar with lldb in this regard. Do we have an example to show how lldb will work in such cases?
Yes. I do. I suggest to support C language first which probably will have least impact compared to C++ or other languages. If there are non-resolvable issues (I hope not), we may not be able to enable by default. We can have a flag to enable it but when building linux kernel, we can enable this feature.
Good question. I don't have concrete cases for this. I guess the main issue probably with artificial inlining. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tbh, I'm not sure why upstream insists on this being a separate pass. From ad-hoc patterns in e.g. getArg()
it appears that original idea to handle signature changes in each individual optimization is simpler and more reliable way handle this.
OldSP->getLine(), // Line | ||
nullptr, // DISubroutineType | ||
OldSP->getScopeLine(), // ScopeLine | ||
DINode::FlagZero, DISubprogram::SPFlagDefinition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think call DIB.createArtificialSubprogram(NewSP)
can be replaced with DINode::FlagArtificial
flag specified here.
Alternatively, will it make sense to just call DIB.createArtificialSubprogram(NewSP)
instead of DIB.createFunction(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding DINode::FlagArtificial sounds a better choice.
|
||
static bool getTypeArgList(Module &M, DIBuilder &DIB, Function *F, | ||
DISubprogram *OldSP, DISubprogram *NewSP, | ||
SmallVector<Metadata *, 5> &TypeList, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think canonical way to pass small vector param is SmallVectorImpl<Metadata> &
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack. Will do.
|
||
// Go through the function itself to replace DILocations. | ||
LLVMContext &Ctx = M.getContext(); | ||
DILocation *DL2 = DILocation::get(Ctx, 0, 0, NewSP, nullptr, 0, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: only first three parameters are required, the rest defaults to zero.
DLlist.push_back(OldDL); | ||
} | ||
DILocation *PrevLoc = DL2; | ||
for (int i = DLlist.size() - 1; i >= 0; i--) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this computation is copy-pasted below for I.getDebugLoc()
, maybe extract it as a function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. A little bit repetition. I didn't polish the code a lot since this is a RFC and things could change again. But will make the change in next revision.
for (int i = DLlist.size() - 1; i >= 0; i--) { | ||
OldDL = DLlist[i]; | ||
PrevLoc = DILocation::get( | ||
Ctx, OldDL->getLine(), OldDL->getColumn(), OldDL->getScope(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OldDL->getScope()
do scopes need cloning alongside the containing subprogram?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The scope will be preserved as the original.
|
||
// Strip modifiers (const, volatile, etc.) | ||
DIType *DITy = Var->getType(); | ||
while (auto *DTy = dyn_cast<DIDerivedType>(DITy)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this derived type processing is copy-pasted multiple times, maybe extract a utility function?
// !DIExpression(DW_OP_LLVM_fragment, 64, 64), !14768) | ||
// ... | ||
// } | ||
static DIType *getTypeFromExpr(DIBuilder &DIB, DIExpression *Expr, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is used only for integer types, right?
Maybe rename it to getIntTypeFromExpr()
?
Thanks for suggestion. Since now we have an overall implementation, let us see whether changes in each individual optimization make sense or not vs. the current implementaiton. |
@OCHyams Since you are worried about debugger with this pull request, could you show me some examples how the change could affect debugger? If the debugger is not happy, we either need to tweak the current implementation or we might need to choose other alternative solution (e.g., yonghong-song@be0cdea). Thanks! |
In llvm pull request [1], the dwarf is changed to accommodate functions whose signatures are different from source level although they have the same name. Other non-source functions are also included in dwarf. The following is an example: The source: ==== $ cat test.c struct t { int a; }; char *tar(struct t *a, struct t *d); __attribute__((noinline)) static char * foo(struct t *a, struct t *d, int b) { return tar(a, d); } char *bar(struct t *a, struct t *d) { return foo(a, d, 1); } ==== Part of generated dwarf: ==== 0x0000005c: DW_TAG_subprogram DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000015) DW_AT_frame_base (DW_OP_reg7 RSP) DW_AT_linkage_name ("foo") DW_AT_name ("foo") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000bb "char *") DW_AT_artificial (true) DW_AT_external (true) 0x0000006c: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg5 RDI) DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x00000075: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg4 RSI) DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x0000007e: DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x0000009a "foo") DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000015) DW_AT_call_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_call_line (0) 0x0000008a: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg5 RDI) DW_AT_abstract_origin (0x000000a2 "a") 0x00000091: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg4 RSI) DW_AT_abstract_origin (0x000000aa "d") 0x00000098: NULL 0x00000099: NULL 0x0000009a: DW_TAG_subprogram DW_AT_name ("foo") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_prototyped (true) DW_AT_type (0x000000bb "char *") DW_AT_inline (DW_INL_inlined) 0x000000a2: DW_TAG_formal_parameter DW_AT_name ("a") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x000000aa: DW_TAG_formal_parameter DW_AT_name ("d") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x000000b2: DW_TAG_formal_parameter DW_AT_name ("b") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000d8 "int") 0x000000ba: NULL ==== In the above, there are two subprograms with the same name 'foo'. Currently btf encoder will consider both functions as ELF functions. Since two subprograms have different signature, the funciton will be ignored. But actually, one of function 'foo' is marked as DW_INL_inlined which means we should not treat it as an elf funciton. The patch fixed this issue by filtering subprograms if the corresponding function__inlined() is true. This will fix the issue for [1]. But it should work fine without [1] too. [1] llvm/llvm-project#157349 Signed-off-by: Yonghong Song <[email protected]> Cc: Arnaldo Carvalho de Melo <[email protected]> Cc: Andrii Nakryiko <[email protected]> Cc: Alexei Starovoitov <[email protected]> Cc: Daniel Borkmann <[email protected]> Link: https://lore.kernel.org/dwarves/[email protected]/ Signed-off-by: Alan Maguire <[email protected]>
ArgumentPromotion pass may change function signatures. If this happens and debuginfo is enabled, let us add DW_CC_nocall to debuginfo so it is clear that the function signature has changed. DeadArgumentElimination ([1]) has similar implementation. Also fix an ArgumentPromotion test due to adding DW_CC_nocall to debuginfo. [1] llvm@340b0ca
During development of emitting dwarf data for signature-changed or new functions, I found two test failures llvm/test/Transforms/SampleProfile/ctxsplit.ll llvm/test/Transforms/SampleProfile/flattened.ll due to incorrect DISubroutineType(s). This patch fixed the issue with proper types.
Add a new pass EmitChangedFuncDebugInfo which will add dwarf for additional functions including functions with signature change and new functions. The previous approach in [1] tries to add debuginfo for those optimization passes which cause signature changes. Based on discussion in [1], it is preferred to have a specific pass to add debuginfo and later on dwarf generation can include those new debuginfo. The following is an example: Source: $ cat test.c struct t { int a; }; char *tar(struct t *a, struct t *d); __attribute__((noinline)) static char * foo(struct t *a, struct t *d, int b) { return tar(a, d); } char *bar(struct t *a, struct t *d) { return foo(a, d, 1); } Compiled and dump dwarf with: clang -O2 -c -g test.c llvm-dwarfdump test.o 0x0000005c: DW_TAG_subprogram DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000015) DW_AT_frame_base (DW_OP_reg7 RSP) DW_AT_linkage_name ("foo") DW_AT_name ("foo") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000bb "char *") DW_AT_artificial (true) DW_AT_external (true) 0x0000006c: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg5 RDI) DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x00000075: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg4 RSI) DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x0000007e: DW_TAG_inlined_subroutine DW_AT_abstract_origin (0x0000009a "foo") DW_AT_low_pc (0x0000000000000010) DW_AT_high_pc (0x0000000000000015) DW_AT_call_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_call_line (0) 0x0000008a: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg5 RDI) DW_AT_abstract_origin (0x000000a2 "a") 0x00000091: DW_TAG_formal_parameter DW_AT_location (DW_OP_reg4 RSI) DW_AT_abstract_origin (0x000000aa "d") 0x00000098: NULL 0x00000099: NULL 0x0000009a: DW_TAG_subprogram DW_AT_name ("foo") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_prototyped (true) DW_AT_type (0x000000bb "char *") DW_AT_inline (DW_INL_inlined) 0x000000a2: DW_TAG_formal_parameter DW_AT_name ("a") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x000000aa: DW_TAG_formal_parameter DW_AT_name ("d") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000c4 "t *") 0x000000b2: DW_TAG_formal_parameter DW_AT_name ("b") DW_AT_decl_file ("/home/yhs/tests/sig-change/deadarg/test.c") DW_AT_decl_line (3) DW_AT_type (0x000000d8 "int") 0x000000ba: NULL There are some restrictions in the current implementation: - Only C language is supported - BPF target is excluded as one of main goals for this pull request is to generate proper vmlinux BTF for arch's like x86_64/arm64 etc. - Function must not be a intrinsic, decl only, return value size more than arch register size and func with variable arguments. I have tested this patch set by building latest bpf-next linux kernel. For no-lto case: 65341 original number of functions 1085 new functions with this patch For thin-lto case: 65595 original number of functions 2492 new functions with this patch [1] llvm#127855
e8844f5
to
821d62a
Compare
@OCHyams I just uploaded another version which fixed some issues mentioned by @eddyz87. But the potential issue related to debugger is not investigated yet and I need your guidance how to test. As discussed in earlier this thread, the new implementation is very similar to code like I think there probably already some examples in debugger to track inlined subroutines, it would be great if you can give some example. |
Add a new pass EmitChangedFuncDebugInfo which will add dwarf for
additional functions including functions with signature change
and new functions.
The previous approach in [1] tries to add debuginfo for those
optimization passes which cause signature changes. Based on
discussion in [1], it is preferred to have a specific pass to
add debuginfo and later on dwarf generation can include those
new debuginfo.
The following is an example:
Source:
Compiled and dump dwarf with:
and related dwarf output
There are some restrictions in the current implementation:
is to generate proper vmlinux BTF for arch's like x86_64/arm64 etc.
than arch register size and func with variable arguments.
argument cannot be easily retrieved from dbg_value etc.).
signatures but the current implementation still marks them
as artificial. We might be able to avoid DW_TAG_inlined_subroutine
for these routines.
I have tested this patch set by building latest bpf-next linux kernel.
For no-lto case:
For thin-lto case:
[1] #127855