Skip to content
Open
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
d98ed02
big archive recognition implementation
midhuncodes7 Jul 21, 2025
64639e1
ELF test not supported on AIX
midhuncodes7 Jul 22, 2025
f5a357e
update yaml script
Jul 24, 2025
305ef99
target specific changes
midhuncodes7 Jul 24, 2025
66d5d11
Merge branch 'midhun7/big-archive-recognition' of github.com:midhunco…
midhuncodes7 Jul 24, 2025
27b4f10
Review comments addressed
midhuncodes7 Jul 30, 2025
647f98e
review comments addressed
midhuncodes7 Aug 1, 2025
629e7a5
review comments addressed
midhuncodes7 Aug 1, 2025
8c71818
Merge branch 'llvm:main' into midhun7/big-archive-recognition
midhuncodes7 Aug 1, 2025
f0be9a8
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 1, 2025
bfa845e
Merge branch 'midhun7/big-archive-recognition' of github.com:midhunco…
midhuncodes7 Aug 1, 2025
9c14b38
format correction
midhuncodes7 Aug 1, 2025
f8aeb3e
added test support
midhuncodes7 Aug 1, 2025
0952fcf
refactor of code to remove duplicates
midhuncodes7 Aug 12, 2025
6be8d1a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 12, 2025
12dd6e5
code formatting
midhuncodes7 Aug 12, 2025
68b886f
code formatting
midhuncodes7 Aug 12, 2025
d0f0a1d
test fail fix
midhuncodes7 Aug 12, 2025
fe5a9c2
code formatting
midhuncodes7 Aug 12, 2025
e2083eb
make tests available on other platforms
midhuncodes7 Aug 14, 2025
12bf16a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 14, 2025
0595460
test changes for windows
midhuncodes7 Aug 18, 2025
4d46767
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 18, 2025
315b98c
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 19, 2025
b37409b
fix review comments
midhuncodes7 Aug 21, 2025
30e4387
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 21, 2025
08f38ae
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 29, 2025
ca85990
review comments addressed
midhuncodes7 Aug 29, 2025
844906c
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 6, 2025
74d4e23
code format
midhuncodes7 Sep 6, 2025
183197a
code format
midhuncodes7 Sep 8, 2025
c68b4d9
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 8, 2025
7407769
review comments fix
midhuncodes7 Sep 8, 2025
4da5ebd
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 8, 2025
b0f0705
added tests
midhuncodes7 Sep 15, 2025
86f2023
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 15, 2025
f034d90
review comments fix
midhuncodes7 Sep 15, 2025
c6d8b90
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 19, 2025
e4d6fed
remove sys call for path
midhuncodes7 Sep 19, 2025
f57d66a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 25, 2025
c3acb24
review comment fix
midhuncodes7 Sep 29, 2025
136bde7
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 29, 2025
b59b5f7
revamp code for better search of archive binary
midhuncodes7 Oct 28, 2025
8a2a05d
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Oct 28, 2025
59f4b64
code format fix
midhuncodes7 Oct 28, 2025
93e5a7c
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Oct 28, 2025
af06657
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Nov 13, 2025
b12e445
review comments fix
midhuncodes7 Nov 13, 2025
307fb74
review comments fix
midhuncodes7 Nov 18, 2025
d91dcd8
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Nov 18, 2025
181ee67
review comments fix
Nov 18, 2025
fdcfdc0
review comments fix
midhuncodes7 Nov 18, 2025
2ad39b3
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Nov 21, 2025
d16b8d4
address review comments
midhuncodes7 Nov 21, 2025
0b06059
Merge branch 'main' into midhun7/big-archive-recognition
midhuncodes7 Nov 21, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions llvm/docs/CommandGuide/llvm-symbolizer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
.. option:: --default-arch <arch>

If a binary contains object files for multiple architectures (e.g. it is a
Mach-O universal binary), symbolize the object file for a given architecture.
You can also specify the architecture by writing ``binary_name:arch_name`` in
the input (see example below). If the architecture is not specified in either
way, the address will not be symbolized. Defaults to empty string.
Mach-O universal binary or an AIX archive with architecture variants),
symbolize the object file for a given architecture. You can also specify
the architecture by writing ``binary_name:arch_name`` in the input (see
example below). For AIX archives, the format ``archive.a(member.o):arch``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs here still refer to AIX archives specifically, yet the behaviour change is generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects

This behaviour is AIX specific, hence the documentation is more specific to AIX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior such as having same object name which vary in architecture getting added to archive is specific to AIX

is also supported. If the architecture is not specified,
the address will not be symbolized. Defaults to empty string.

.. code-block:: console

$ cat addr.txt
/tmp/mach_universal_binary:i386 0x1f84
/tmp/mach_universal_binary:x86_64 0x100000f24
/tmp/archive.a(member.o):ppc 0x1000
/tmp/archive.a(member.o):ppc64 0x2000

$ llvm-symbolizer < addr.txt
_main
Expand All @@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
_main
/tmp/source_x86_64.cc:8

_foo
/tmp/source_ppc.cc:12

_foo
/tmp/source_ppc64.cc:12

.. option:: --dsym-hint <path/to/file.dSYM>

If the debug info for a binary isn't present in the default location, look for
Expand Down
54 changes: 44 additions & 10 deletions llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
//===- Symbolize.h ----------------------------------------------*- C++ -*-===//
//===- Symbolize.h ----------------------------------------------*- C++
//-*-===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
Expand All @@ -13,6 +14,7 @@
#ifndef LLVM_DEBUGINFO_SYMBOLIZE_SYMBOLIZE_H
#define LLVM_DEBUGINFO_SYMBOLIZE_SYMBOLIZE_H

#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/StringMap.h"
#include "llvm/ADT/ilist_node.h"
#include "llvm/ADT/simple_ilist.h"
Expand All @@ -25,6 +27,7 @@
#include <cstdint>
#include <map>
#include <memory>
#include <optional>
#include <string>
#include <utility>
#include <vector>
Expand Down Expand Up @@ -196,11 +199,18 @@ class LLVMSymbolizer {
Expected<ObjectPair> getOrCreateObjectPair(const std::string &Path,
const std::string &ArchName);

/// Return a pointer to object file at specified path, for a specified
/// architecture (e.g. if path refers to a Mach-O universal binary, only one
/// object file from it will be returned).
Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
const std::string &ArchName);
/// Return a pointer to the object file with the specified name, for a
/// specified architecture (e.g. if path refers to a Mach-O universal
/// binary, only one object file from it will be returned).
Expected<ObjectFile *> getOrCreateObject(const std::string &InputPath,
const std::string &DefaultArchName);

/// Return a pointer to the object file with the specified name, for a
/// specified architecture that is present inside an archive file.
Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
StringRef MemberName,
StringRef ArchName,
StringRef FullPath);

/// Update the LRU cache order when a binary is accessed.
void recordAccess(CachedBinary &Bin);
Expand All @@ -216,15 +226,39 @@ class LLVMSymbolizer {
/// Contains parsed binary for each path, or parsing error.
std::map<std::string, CachedBinary, std::less<>> BinaryForPath;

/// Store the archive path for the object file.
DenseMap<const object::ObjectFile *, std::string> ObjectToArchivePath;

/// A list of cached binaries in LRU order.
simple_ilist<CachedBinary> LRUBinaries;
/// Sum of the sizes of the cached binaries.
size_t CacheSize = 0;

/// Parsed object file for path/architecture pair, where "path" refers
/// to Mach-O universal binary.
std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
ObjectForUBPathAndArch;
struct ContainerCacheKey {
std::string Path;
std::string MemberName;
std::string ArchName;

// Required for map comparison.
bool operator<(const ContainerCacheKey &Other) const {
return std::tie(Path, MemberName, ArchName) <
std::tie(Other.Path, Other.MemberName, Other.ArchName);
}
};

/// Parsed object file for each path/member/architecture triple.
/// Used to cache objects extracted from containers (Mach-O universal
/// binaries, archives).
std::map<ContainerCacheKey, std::unique_ptr<ObjectFile>> ObjectFileCache;

Expected<object::Binary *>
loadOrGetBinary(const std::string &ArchivePathKey,
std::optional<StringRef> FullPathKey = std::nullopt);

Expected<ObjectFile *> findOrCacheObject(
const ContainerCacheKey &Key,
llvm::function_ref<Expected<std::unique_ptr<ObjectFile>>()> Loader,
const std::string &PathForBinaryCache);

Options Opts;

Expand Down
184 changes: 143 additions & 41 deletions llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
//===-- LLVMSymbolize.cpp -------------------------------------------------===//
//===-- LLVMSymbolize.cpp
//-------------------------------------------------===//
//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.
Expand All @@ -21,6 +22,7 @@
#include "llvm/DebugInfo/PDB/PDBContext.h"
#include "llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h"
#include "llvm/Demangle/Demangle.h"
#include "llvm/Object/Archive.h"
#include "llvm/Object/BuildID.h"
#include "llvm/Object/COFF.h"
#include "llvm/Object/ELFObjectFile.h"
Expand Down Expand Up @@ -285,7 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
}

void LLVMSymbolizer::flush() {
ObjectForUBPathAndArch.clear();
ObjectFileCache.clear();
LRUBinaries.clear();
CacheSize = 0;
BinaryForPath.clear();
Expand Down Expand Up @@ -557,57 +559,157 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
if (!DbgObj)
DbgObj = Obj;
ObjectPair Res = std::make_pair(Obj, DbgObj);
std::string DbgObjPath = DbgObj->getFileName().str();
auto Pair =
ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
ObjectPairForPathArch.erase(I);
});
std::string FullDbgObjKey;
auto It = ObjectToArchivePath.find(DbgObj);
if (It != ObjectToArchivePath.end()) {
StringRef ArchivePath = It->second;
StringRef MemberName = sys::path::filename(DbgObj->getFileName());
FullDbgObjKey = (ArchivePath + "(" + MemberName + ")").str();
} else {
FullDbgObjKey = DbgObj->getFileName().str();
}
BinaryForPath.find(FullDbgObjKey)
->second.pushEvictor(
[this, I = Pair.first]() { ObjectPairForPathArch.erase(I); });
return Res;
}

Expected<ObjectFile *>
LLVMSymbolizer::getOrCreateObject(const std::string &Path,
const std::string &ArchName) {
Binary *Bin;
auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
Expected<object::Binary *>
LLVMSymbolizer::loadOrGetBinary(const std::string &ArchivePathKey,
std::optional<StringRef> FullPathKey) {
// If no separate cache key is provided, use the archive path itself.
std::string FullPathKeyStr =
FullPathKey ? FullPathKey->str() : ArchivePathKey;
auto Pair = BinaryForPath.emplace(FullPathKeyStr, OwningBinary<Binary>());
if (!Pair.second) {
Bin = Pair.first->second->getBinary();
recordAccess(Pair.first->second);
} else {
Expected<OwningBinary<Binary>> BinOrErr = createBinary(Path);
if (!BinOrErr)
return BinOrErr.takeError();
return Pair.first->second->getBinary();
}

Expected<OwningBinary<Binary>> BinOrErr = createBinary(ArchivePathKey);
if (!BinOrErr)
return BinOrErr.takeError();

CachedBinary &CachedBin = Pair.first->second;
CachedBin = std::move(BinOrErr.get());
CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
LRUBinaries.push_back(CachedBin);
CacheSize += CachedBin.size();
Bin = CachedBin->getBinary();
CachedBinary &CachedBin = Pair.first->second;
CachedBin = std::move(*BinOrErr);
CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
LRUBinaries.push_back(CachedBin);
CacheSize += CachedBin.size();
return CachedBin->getBinary();
}

Expected<ObjectFile *> LLVMSymbolizer::findOrCacheObject(
const ContainerCacheKey &Key,
llvm::function_ref<Expected<std::unique_ptr<ObjectFile>>()> Loader,
const std::string &PathForBinaryCache) {
auto It = ObjectFileCache.find(Key);
if (It != ObjectFileCache.end())
return It->second.get();

Expected<std::unique_ptr<ObjectFile>> ObjOrErr = Loader();
if (!ObjOrErr) {
ObjectFileCache.emplace(Key, std::unique_ptr<ObjectFile>());
return ObjOrErr.takeError();
}

if (!Bin)
return static_cast<ObjectFile *>(nullptr);
ObjectFile *Res = ObjOrErr->get();
auto NewEntry = ObjectFileCache.emplace(Key, std::move(*ObjOrErr));
auto CacheIter = BinaryForPath.find(PathForBinaryCache);
if (CacheIter != BinaryForPath.end())
CacheIter->second.pushEvictor(
[this, Iter = NewEntry.first]() { ObjectFileCache.erase(Iter); });
return Res;
}

if (MachOUniversalBinary *UB = dyn_cast_or_null<MachOUniversalBinary>(Bin)) {
auto I = ObjectForUBPathAndArch.find(std::make_pair(Path, ArchName));
if (I != ObjectForUBPathAndArch.end())
return I->second.get();

Expected<std::unique_ptr<ObjectFile>> ObjOrErr =
UB->getMachOObjectForArch(ArchName);
if (!ObjOrErr) {
ObjectForUBPathAndArch.emplace(std::make_pair(Path, ArchName),
std::unique_ptr<ObjectFile>());
return ObjOrErr.takeError();
Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(
StringRef ArchivePath, StringRef MemberName, StringRef ArchName,
StringRef FullPath) {
Expected<object::Binary *> BinOrErr =
loadOrGetBinary(ArchivePath.str(), FullPath);
if (!BinOrErr)
return BinOrErr.takeError();
object::Binary *Bin = *BinOrErr;

object::Archive *Archive = dyn_cast_if_present<object::Archive>(Bin);
if (!Archive)
return createStringError(std::errc::invalid_argument,
"'%s' is not a valid archive",
ArchivePath.str().c_str());

Error Err = Error::success();
for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
Expected<StringRef> NameOrErr = Child.getName();
if (!NameOrErr)
continue;
if (*NameOrErr == MemberName) {
Expected<std::unique_ptr<object::Binary>> MemberOrErr =
Child.getAsBinary();
if (!MemberOrErr)
continue;

std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
ObjectToArchivePath[Obj] = ArchivePath.str();
Triple::ArchType ObjArch = Obj->makeTriple().getArch();
Triple RequestedTriple;
RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
if (ObjArch != RequestedTriple.getArch())
continue;

ContainerCacheKey CacheKey{ArchivePath.str(), MemberName.str(),
ArchName.str()};
Expected<ObjectFile *> Res = findOrCacheObject(
CacheKey,
[O = std::unique_ptr<ObjectFile>(
Obj)]() mutable -> Expected<std::unique_ptr<ObjectFile>> {
return std::move(O);
},
ArchivePath.str());
Binary.release();
return Res;
}
}
}
if (Err)
return std::move(Err);
return createStringError(std::errc::invalid_argument,
"no matching member '%s' with arch '%s' in '%s'",
MemberName.str().c_str(), ArchName.str().c_str(),
ArchivePath.str().c_str());
}

Expected<ObjectFile *>
LLVMSymbolizer::getOrCreateObject(const std::string &Path,
const std::string &ArchName) {
// First check for archive(member) format - more efficient to check closing
// paren first.
if (!Path.empty() && Path.back() == ')') {
size_t OpenParen = Path.rfind('(', Path.size() - 1);
if (OpenParen != std::string::npos) {
StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
StringRef MemberName =
StringRef(Path).substr(OpenParen + 1, Path.size() - OpenParen - 2);
StringRef FullPath = Path;
return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName,
FullPath);
}
ObjectFile *Res = ObjOrErr->get();
auto Pair = ObjectForUBPathAndArch.emplace(std::make_pair(Path, ArchName),
std::move(ObjOrErr.get()));
BinaryForPath.find(Path)->second.pushEvictor(
[this, Iter = Pair.first]() { ObjectForUBPathAndArch.erase(Iter); });
return Res;
}

Expected<object::Binary *> BinOrErr = loadOrGetBinary(Path);
if (!BinOrErr)
return BinOrErr.takeError();
object::Binary *Bin = *BinOrErr;

if (MachOUniversalBinary *UB = dyn_cast_or_null<MachOUniversalBinary>(Bin)) {
ContainerCacheKey CacheKey{Path, "", ArchName};
return findOrCacheObject(
CacheKey,
[UB, ArchName]() -> Expected<std::unique_ptr<ObjectFile>> {
return UB->getMachOObjectForArch(ArchName);
},
Path);
}
if (Bin->isObject()) {
return cast<ObjectFile>(Bin);
Expand Down
Loading
Loading