Skip to content
Open
5 changes: 5 additions & 0 deletions llvm/include/llvm/Object/ObjectFile.h
Original file line number Diff line number Diff line change
Expand Up @@ -198,6 +198,7 @@ class SymbolRef : public BasicSymbolRef {
/// Get the alignment of this symbol as the actual value (not log 2).
uint32_t getAlignment() const;
uint64_t getCommonSize() const;
uint64_t getSize() const;
Expected<SymbolRef::Type> getType() const;

/// Get section this symbol is defined in reference to. Result is
Expand Down Expand Up @@ -482,6 +483,10 @@ inline uint64_t SymbolRef::getCommonSize() const {
return getObject()->getCommonSymbolSize(getRawDataRefImpl());
}

inline uint64_t SymbolRef::getSize() const {
return getObject()->getCommonSymbolSizeImpl(getRawDataRefImpl());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For COFF. getCommonSymbolSizeImpl returns symbol value which is start address offset of this function.

uint64_t COFFObjectFile::getCommonSymbolSizeImpl(DataRefImpl Ref) const {
COFFSymbolRef Symb = getCOFFSymbol(Ref);
return Symb.getValue();
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, thanks for checking on this! Is this behavior intended for COFF? Because there's another SymbolRef::getCommonSymbolSize that does exactly the same thing but with a assertation check:

assert(*SymbolFlagsOrErr & SymbolRef::SF_Common);

Would it be better to fix the SymbolRef or the COFF getCommonSymbolSizeImpl to either get the actual symbol size or return 0?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getCommonSymbolSize is used for common symbol.
You can use -fcommon to explicitly generate them. e.g. for a simple global variable int cval;

ELF:
  Symbol {
    Name: cval (35)
    Value: 0x4
    Size: 4
    Binding: Global (0x1)
    Type: Object (0x1)
    Other: 0
    Section: Common (0xFFF2)
  }
COFF:
  Symbol {
    Name: cval
    Value: 4
    Section: IMAGE_SYM_UNDEFINED (0)
    BaseType: Null (0x0)
    ComplexType: Null (0x0)
    StorageClass: External (0x2)
    AuxSymbolCount: 0
  }

For ELF, there's size filed. size and Value of common symbol are all its size.
For COFF, there's no size field , the symbol's value is its size. I think COFF's implementation is correct.

Regarding function symbol's size.
For ELF, It is the function's size. I think you can use ELFSymbolRef::getSize() safely.
For COFF, it does not support it currently. I think we should use value 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think size for COFF can be supported by PE format, although it has not been supported yet.
https://learn.microsoft.com/en-us/windows/win32/debug/pe-format#auxiliary-format-2-bf-and-ef-symbols
May be we can file another PR to support it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! For now I've added another virtual function ObjectFile::getSymbolSizeImpl that returns 0 by default. The ELFObjectFile overrides this and returns getSymbolSize.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should add a new virtual function in ObjectFile since the symbol size is only valid for part of symbol table. That's why there's getSize methond in ELFSymbolRef.
BTW, ObjectFile::getSymbolSizeImpl should return the size for common symbol just like getCommonSize.

How about define a function/lambda to get function symbol's size in ProfiledBinary::populateSymbolsFromBinary to avoid modifying ObjectFile interface?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense. let me change it to ELF-specific then. I did something like this:

// ProfiledBinary::populateSymbolsFromBinary()
uint64_t Size = 0;
if (isa<ELFObjectFileBase>(Symbol.getObject())) {
  ELFSymbolRef ElfSymbol(Symbol);
  Size = ElfSymbol.getSize();
}

}

inline Expected<section_iterator> SymbolRef::getSection() const {
return getObject()->getSymbolSection(getRawDataRefImpl());
}
Expand Down
11 changes: 8 additions & 3 deletions llvm/include/llvm/ProfileData/SampleProf.h
Original file line number Diff line number Diff line change
Expand Up @@ -1214,13 +1214,18 @@ class FunctionSamples {
// Note the sequence of the suffixes in the knownSuffixes array matters.
// If suffix "A" is appended after the suffix "B", "A" should be in front
// of "B" in knownSuffixes.
const char *KnownSuffixes[] = {LLVMSuffix, PartSuffix, UniqSuffix};
const char *KnownSuffixes[] = {LLVMSuffix, PartSuffix, UniqSuffix, nullptr};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT:

ArrayRef(const iterator_range<U *> &Range)

 getCanonicalFnName(FnName, KnownSuffixes, Attr);
 static StringRef getCanonicalFnName(StringRef FnName, ArrayRef<StringRef> Suffixes,
                                     StringRef Attr = "selected") {

return getCanonicalFnName(FnName, KnownSuffixes, Attr);
}

static StringRef getCanonicalFnName(StringRef FnName, const char *Suffixes[],
StringRef Attr = "selected") {
if (Attr == "" || Attr == "all")
return FnName.split('.').first;
if (Attr == "selected") {
StringRef Cand(FnName);
for (const auto &Suf : KnownSuffixes) {
StringRef Suffix(Suf);
for (const char **Suf = Suffixes; *Suf; Suf++) {
StringRef Suffix(*Suf);
// If the profile contains ".__uniq." suffix, don't strip the
// suffix for names in the IR.
if (Suffix == UniqSuffix && FunctionSamples::HasUniqSuffix)
Expand Down
33 changes: 33 additions & 0 deletions llvm/tools/llvm-profgen/ProfileGenerator.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -449,29 +449,62 @@ bool ProfileGeneratorBase::collectFunctionsFromRawProfile(
// Go through all the stacks, ranges and branches in sample counters, use
// the start of the range to look up the function it belongs and record the
// function.
uint64_t ErrStkAddr = 0, ErrFuncRange = 0, ErrSrc = 0, ErrTgt = 0;
uint64_t TotalStkAddr = 0, TotalFuncRange = 0, TotalSrc = 0, TotalTgt = 0;
for (const auto &CI : *SampleCounters) {
if (const auto *CtxKey = dyn_cast<AddrBasedCtxKey>(CI.first.getPtr())) {
for (auto StackAddr : CtxKey->Context) {
uint64_t Inc = Binary->addressIsCode(StackAddr) ? 1 : 0;
TotalStkAddr += Inc;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit for readability:

bool StackAddrIsCode = Binary->addressIsCode(StackAddr);
if (StackAddrIsCode)
  TotalStkAddr++;

if (...)
   ProfiledFunctions.insert(FRange->Func);
else if (StackAddrIsCode)
   ErrStkAddr++;

...

same for all other instances. 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a question, do we expect StackAddrIsCode == false, but findFuncRange has a valid function range?

if (FuncRange *FRange = Binary->findFuncRange(StackAddr))
ProfiledFunctions.insert(FRange->Func);
else
ErrStkAddr += Inc;
}
}

for (auto Item : CI.second.RangeCounter) {
uint64_t StartAddress = Item.first.first;
uint64_t Inc = Binary->addressIsCode(StartAddress) ? Item.second : 0;
TotalFuncRange += Inc;
if (FuncRange *FRange = Binary->findFuncRange(StartAddress))
ProfiledFunctions.insert(FRange->Func);
else
ErrFuncRange += Inc;
}

for (auto Item : CI.second.BranchCounter) {
uint64_t SourceAddress = Item.first.first;
uint64_t TargetAddress = Item.first.second;
uint64_t SrcInc = Binary->addressIsCode(SourceAddress) ? Item.second : 0;
uint64_t TgtInc = Binary->addressIsCode(TargetAddress) ? Item.second : 0;
TotalSrc += SrcInc;
if (FuncRange *FRange = Binary->findFuncRange(SourceAddress))
ProfiledFunctions.insert(FRange->Func);
else
ErrSrc += SrcInc;
TotalTgt += TgtInc;
if (FuncRange *FRange = Binary->findFuncRange(TargetAddress))
ProfiledFunctions.insert(FRange->Func);
else
ErrTgt += TgtInc;
}
}

if (ErrStkAddr)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: No need to check the value here.
There are checks in the emitWarningSummary (https://github.com/llvm/llvm-project/blob/main/llvm/tools/llvm-profgen/ErrorHandling.h#L50)

emitWarningSummary(
ErrStkAddr, TotalStkAddr,
"of stack address samples do not belong to any function");
if (ErrFuncRange)
emitWarningSummary(
ErrFuncRange, TotalFuncRange,
"of function range samples do not belong to any function");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally, we'd like to how the count that 1) does not belong to any function in Dwarf, and 2) the count that does not belong to any function in Dwarf and symbol table.

This tells us how much loading symbol table is helping, and how much dwarf is screwed (due to relo overflow).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me move this message to the ProfiledBinary::populateSymbolsFromBinary instead :)

if (ErrSrc)
emitWarningSummary(ErrSrc, TotalSrc,
"of LBR source samples do not belong to any function");
if (ErrTgt)
emitWarningSummary(ErrTgt, TotalTgt,
"of LBR target samples do not belong to any function");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need all of this? range samples, source, target are kind of duplicated info?

If we only track range, can we move the warning back into WarnInvalidRange?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, that makes sense. It a bit too much logging here, and it was mostly just for debugging at the first place... let me move the warning back :)

return true;
}

Expand Down
48 changes: 46 additions & 2 deletions llvm/tools/llvm-profgen/ProfiledBinary.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,8 @@ void ProfiledBinary::load() {
if (ShowDisassemblyOnly)
decodePseudoProbe(Obj);

populateSymbolsFromBinary(Obj);

// Disassemble the text sections.
disassemble(Obj);

Expand Down Expand Up @@ -604,13 +606,13 @@ bool ProfiledBinary::dissassembleSymbol(std::size_t SI, ArrayRef<uint8_t> Bytes,
// Record potential call targets for tail frame inference later-on.
if (InferMissingFrames && FRange) {
uint64_t Target = 0;
MIA->evaluateBranch(Inst, Address, Size, Target);
bool Err = MIA->evaluateBranch(Inst, Address, Size, Target);
if (MCDesc.isCall()) {
// Indirect call targets are unknown at this point. Recording the
// unknown target (zero) for further LBR-based refinement.
MissingContextInferrer->CallEdges[Address].insert(Target);
} else if (MCDesc.isUnconditionalBranch()) {
assert(Target &&
assert(Err &&
"target should be known for unconditional direct branch");
// Any inter-function unconditional jump is considered tail call at
// this point. This is not 100% accurate and could further be
Expand Down Expand Up @@ -820,6 +822,48 @@ void ProfiledBinary::populateSymbolAddressList(const ObjectFile *Obj) {
}
}

void ProfiledBinary::populateSymbolsFromBinary(const ObjectFile *Obj) {
// Load binary functions from symbol table when Debug info is incomplete
StringRef FileName = Obj->getFileName();
for (const SymbolRef &Symbol : Obj->symbols()) {
const SymbolRef::Type Type = unwrapOrError(Symbol.getType(), FileName);
const uint64_t Addr = unwrapOrError(Symbol.getAddress(), FileName);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: maybe Addr --> StartAddr

const StringRef Name = unwrapOrError(Symbol.getName(), FileName);
const uint64_t Size = Symbol.getSize();

if (Size == 0 || Type != SymbolRef::ST_Function)
continue;

const char *Suffixes[] = {".destroy", ".resume", ".llvm.",
".cold", ".warm", nullptr};
const StringRef SymName =
FunctionSamples::getCanonicalFnName(Name, Suffixes);

auto Ret = BinaryFunctions.emplace(SymName, BinaryFunction());
auto &Func = Ret.first->second;
if (Ret.second)
Func.FuncName = Ret.first->first;

if (auto Range = findFuncRange(Addr)) {
if (Ret.second && ShowDetailedWarning)
WithColor::warning()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this result in a large number of warnings since most of the symbols can be obtained from DWARF?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Ret.second check shall skip all the symbols that already obtained from DWARF. (the symbol name will exists in the BinaryFunctions and the emplace shall prevent the insertation and Ret.second will be set to "false")

auto Ret = BinaryFunctions.emplace(SymName, BinaryFunction());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Symbol table finding entry already existing in dwarf should not be a warning.

When that happens, we should check if they both point to same ranges, and only warn when

  1. symbol table and dwarf conflicts for the actual range.
  2. something is in symbol table, but not in dwarf (suggesting dwarf corrupted, e.g. due to relo overflow)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated this part and the warning shall now capture case 1. And I've merged (2) into the PerfScriptReader::warnInvalidRange :)

<< "Symbol " << Name << " start address "
<< format("%8" PRIx64, Addr) << " already exists in DWARF at "
<< format("%8" PRIx64, Range->StartAddress) << " in function "
<< Range->getFuncName() << "\n";
} else {
// Store/Update Function Range from SymTab
Func.Ranges.emplace_back(Addr, Addr + Size);

auto R = StartAddrToFuncRangeMap.emplace(Addr, FuncRange());
FuncRange &FRange = R.first->second;
FRange.Func = &Func;
FRange.StartAddress = Addr;
FRange.EndAddress = Addr + Size;
}
}
}

void ProfiledBinary::loadSymbolsFromDWARFUnit(DWARFUnit &CompilationUnit) {
for (const auto &DieInfo : CompilationUnit.dies()) {
llvm::DWARFDie Die(&CompilationUnit, &DieInfo);
Expand Down
3 changes: 3 additions & 0 deletions llvm/tools/llvm-profgen/ProfiledBinary.h
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,9 @@ class ProfiledBinary {
// Create symbol to its start address mapping.
void populateSymbolAddressList(const object::ObjectFile *O);

// Load functions from its symbol table (when DWARF info is missing).
void populateSymbolsFromBinary(const object::ObjectFile *O);

// A function may be spilt into multiple non-continuous address ranges. We use
// this to set whether start a function range is the real entry of the
// function and also set false to the non-function label.
Expand Down
Loading