Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
d98ed02
big archive recognition implementation
midhuncodes7 Jul 21, 2025
64639e1
ELF test not supported on AIX
midhuncodes7 Jul 22, 2025
f5a357e
update yaml script
Jul 24, 2025
305ef99
target specific changes
midhuncodes7 Jul 24, 2025
66d5d11
Merge branch 'midhun7/big-archive-recognition' of github.com:midhunco…
midhuncodes7 Jul 24, 2025
27b4f10
Review comments addressed
midhuncodes7 Jul 30, 2025
647f98e
review comments addressed
midhuncodes7 Aug 1, 2025
629e7a5
review comments addressed
midhuncodes7 Aug 1, 2025
8c71818
Merge branch 'llvm:main' into midhun7/big-archive-recognition
midhuncodes7 Aug 1, 2025
f0be9a8
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 1, 2025
bfa845e
Merge branch 'midhun7/big-archive-recognition' of github.com:midhunco…
midhuncodes7 Aug 1, 2025
9c14b38
format correction
midhuncodes7 Aug 1, 2025
f8aeb3e
added test support
midhuncodes7 Aug 1, 2025
0952fcf
refactor of code to remove duplicates
midhuncodes7 Aug 12, 2025
6be8d1a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 12, 2025
12dd6e5
code formatting
midhuncodes7 Aug 12, 2025
68b886f
code formatting
midhuncodes7 Aug 12, 2025
d0f0a1d
test fail fix
midhuncodes7 Aug 12, 2025
fe5a9c2
code formatting
midhuncodes7 Aug 12, 2025
e2083eb
make tests available on other platforms
midhuncodes7 Aug 14, 2025
12bf16a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 14, 2025
0595460
test changes for windows
midhuncodes7 Aug 18, 2025
4d46767
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 18, 2025
315b98c
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 19, 2025
b37409b
fix review comments
midhuncodes7 Aug 21, 2025
30e4387
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 21, 2025
08f38ae
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 29, 2025
ca85990
review comments addressed
midhuncodes7 Aug 29, 2025
844906c
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 6, 2025
74d4e23
code format
midhuncodes7 Sep 6, 2025
183197a
code format
midhuncodes7 Sep 8, 2025
c68b4d9
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 8, 2025
7407769
review comments fix
midhuncodes7 Sep 8, 2025
4da5ebd
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 8, 2025
b0f0705
added tests
midhuncodes7 Sep 15, 2025
86f2023
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 15, 2025
f034d90
review comments fix
midhuncodes7 Sep 15, 2025
c6d8b90
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 19, 2025
e4d6fed
remove sys call for path
midhuncodes7 Sep 19, 2025
f57d66a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 25, 2025
c3acb24
review comment fix
midhuncodes7 Sep 29, 2025
136bde7
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions llvm/docs/CommandGuide/llvm-symbolizer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
.. option:: --default-arch <arch>

If a binary contains object files for multiple architectures (e.g. it is a
Mach-O universal binary), symbolize the object file for a given architecture.
You can also specify the architecture by writing ``binary_name:arch_name`` in
the input (see example below). If the architecture is not specified in either
way, the address will not be symbolized. Defaults to empty string.
Mach-O universal binary or an AIX archive with architecture variants),
symbolize the object file for a given architecture. You can also specify
the architecture by writing ``binary_name:arch_name`` in the input (see
example below). For AIX archives, the format ``archive.a(member.o):arch``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs here still refer to AIX archives specifically, yet the behaviour change is generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects

This behaviour is AIX specific, hence the documentation is more specific to AIX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior such as having same object name which vary in architecture getting added to archive is specific to AIX

is also supported. If the architecture is not specified,
the address will not be symbolized. Defaults to empty string.

.. code-block:: console

$ cat addr.txt
/tmp/mach_universal_binary:i386 0x1f84
/tmp/mach_universal_binary:x86_64 0x100000f24
/tmp/archive.a(member.o):ppc 0x1000
/tmp/archive.a(member.o):ppc64 0x2000

$ llvm-symbolizer < addr.txt
_main
Expand All @@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
_main
/tmp/source_x86_64.cc:8

_foo
/tmp/source_ppc.cc:12

_foo
/tmp/source_ppc64.cc:12

.. option:: --dsym-hint <path/to/file.dSYM>

If the debug info for a binary isn't present in the default location, look for
Expand Down
44 changes: 35 additions & 9 deletions llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
Original file line number Diff line number Diff line change
Expand Up @@ -196,11 +196,17 @@ class LLVMSymbolizer {
Expected<ObjectPair> getOrCreateObjectPair(const std::string &Path,
const std::string &ArchName);

/// Return a pointer to object file at specified path, for a specified
/// architecture (e.g. if path refers to a Mach-O universal binary, only one
/// object file from it will be returned).
Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
const std::string &ArchName);
/// Return a pointer to the object file with the specified name, for a
/// specified architecture (e.g. if path refers to a Mach-O universal
/// binary, only one object file from it will be returned).
Expected<ObjectFile *> getOrCreateObject(const std::string &InputPath,
const std::string &DefaultArchName);

/// Return a pointer to the object file with the specified name, for a
/// specified architecture that is present inside an archive file.
Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
StringRef MemberName,
StringRef ArchName);

/// Update the LRU cache order when a binary is accessed.
void recordAccess(CachedBinary &Bin);
Expand All @@ -221,10 +227,30 @@ class LLVMSymbolizer {
/// Sum of the sizes of the cached binaries.
size_t CacheSize = 0;

/// Parsed object file for path/architecture pair, where "path" refers
/// to Mach-O universal binary.
std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
ObjectForUBPathAndArch;
struct ArchiveCacheKey {
std::string ArchivePath;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: references to Archive in the names here are misleading, because they might be referring to a Mach-O universal binary, if I'm not mistaken.

std::string MemberName;
std::string ArchName;

// Required for map comparison.
bool operator<(const ArchiveCacheKey &Other) const {
return std::tie(ArchivePath, MemberName, ArchName) <
std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
}
};

/// Parsed object file for path/object/architecture pair, where
/// "path" refers to Mach-O universal binary.
std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>> ObjectFileCache;

/// Helper function to load binary.
Expected<object::Binary *> loadOrGetBinary(const std::string &Path);

/// Helper function to find and get object.
Expected<ObjectFile *> findOrCacheObject(
const ArchiveCacheKey &Key,
llvm::function_ref<Expected<std::unique_ptr<ObjectFile>>()> Loader,
const std::string &PathForBinaryCache);

Options Opts;

Expand Down
178 changes: 135 additions & 43 deletions llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "llvm/DebugInfo/PDB/PDBContext.h"
#include "llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h"
#include "llvm/Demangle/Demangle.h"
#include "llvm/Object/Archive.h"
#include "llvm/Object/BuildID.h"
#include "llvm/Object/COFF.h"
#include "llvm/Object/ELFObjectFile.h"
Expand Down Expand Up @@ -285,7 +286,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
}

void LLVMSymbolizer::flush() {
ObjectForUBPathAndArch.clear();
ObjectFileCache.clear();
LRUBinaries.clear();
CacheSize = 0;
BinaryForPath.clear();
Expand Down Expand Up @@ -557,57 +558,146 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
if (!DbgObj)
DbgObj = Obj;
ObjectPair Res = std::make_pair(Obj, DbgObj);
std::string DbgObjPath = DbgObj->getFileName().str();
auto Pair =
ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
ObjectPairForPathArch.erase(I);
});
std::string DbgObjPath = DbgObj->getFileName().str();
auto BinIter = BinaryForPath.find(DbgObjPath);
if (BinIter != BinaryForPath.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I get an explanation for why this behaviour has changed, please. Also, I assume you have test coverage for it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some iterations, the BinaryForPath.find(BinaryName) fails and returns BinaryForPath.end() but the previous code doesn't check for it, so it ends up calling pushEvictor on invalid memory. This results in a crash, hence a check is added. The behaviour is not changed its just split into multiple steps having the same logic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "in some iterations"? Is this actually related to your addition of big archive support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryForPath.find(BinaryName) can return BinaryForPath.end() when the Binary is not yet cached.
On such instances, pushEvictor might get called on an invalid memory. To prevent that the check is introduced.
This issue started occuring after big archive format implementation - t.a(t.o). It did not occur when we tried to symbolize objects directly(t.o).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other comments, please explain in detail from the entry point what the code path is that could hit here without the binary being cached, please, because I don't see it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a plain object file (t.o) the ModuleName is "t.o" and BinaryForPath also stores an entry keyed by "t.o". BinaryForPath.find(BinaryName) succeeds and so is ->second.pushEvictor(...).
For an archive member "(t.a(t.o))", loadOrGetBinary (and BinaryForPath.emplace) is called with the archive path "t.a" (not the "(t.o)" suffix).
But getOrCreateModuleInfo still uses ModuleName (which is the full "t.a(t.o)") as BinaryName when it later does BinaryForPath.find(BinaryName).
That find("t.a(t.o)") returns end() (no such key) and when ->second on that end() result — undefined behavior that manifests as a segfault during destructor call.

Updating BinaryForPath as "t.a(t.o)" will result in invalid path.
Updating BinaryName as "t.a" will result in breaking per-member resolution

Hope this helps

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's going to take me some time to go over this and determine if there's a better approach. I suspect there is, but I'm not sure yet what that is.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given it some more thought and I'm fairly convinced the problem is that you're storing the wrong thing in BinaryForPath when dealing with archive members. Specifically, this should store the final binary (i.e. the archive child binary) and should be keyed on t.a(t.o) (or equivalent).

This will require some reorganisation of the code from what you've done, so that loadOrGetBinary knows that it's expecting an archive and fetches the child out of that.

Does this make sense?

BinIter->second.pushEvictor(
[this, I = Pair.first]() { ObjectPairForPathArch.erase(I); });
}
return Res;
}

Expected<ObjectFile *>
LLVMSymbolizer::getOrCreateObject(const std::string &Path,
const std::string &ArchName) {
Binary *Bin;
Expected<object::Binary *>
LLVMSymbolizer::loadOrGetBinary(const std::string &Path) {
auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
if (!Pair.second) {
Bin = Pair.first->second->getBinary();
recordAccess(Pair.first->second);
} else {
Expected<OwningBinary<Binary>> BinOrErr = createBinary(Path);
if (!BinOrErr)
return BinOrErr.takeError();
return Pair.first->second->getBinary();
}

CachedBinary &CachedBin = Pair.first->second;
CachedBin = std::move(BinOrErr.get());
CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
LRUBinaries.push_back(CachedBin);
CacheSize += CachedBin.size();
Bin = CachedBin->getBinary();
Expected<OwningBinary<Binary>> BinOrErr = createBinary(Path);
if (!BinOrErr) {
return BinOrErr.takeError();
}

if (!Bin)
return static_cast<ObjectFile *>(nullptr);
CachedBinary &CachedBin = Pair.first->second;
CachedBin = std::move(*BinOrErr);
CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
LRUBinaries.push_back(CachedBin);
CacheSize += CachedBin.size();
return CachedBin->getBinary();
}

if (MachOUniversalBinary *UB = dyn_cast_or_null<MachOUniversalBinary>(Bin)) {
auto I = ObjectForUBPathAndArch.find(std::make_pair(Path, ArchName));
if (I != ObjectForUBPathAndArch.end())
return I->second.get();

Expected<std::unique_ptr<ObjectFile>> ObjOrErr =
UB->getMachOObjectForArch(ArchName);
if (!ObjOrErr) {
ObjectForUBPathAndArch.emplace(std::make_pair(Path, ArchName),
std::unique_ptr<ObjectFile>());
return ObjOrErr.takeError();
Expected<ObjectFile *> LLVMSymbolizer::findOrCacheObject(
const ArchiveCacheKey &Key,
llvm::function_ref<Expected<std::unique_ptr<ObjectFile>>()> Loader,
const std::string &PathForBinaryCache) {
auto It = ObjectFileCache.find(Key);
if (It != ObjectFileCache.end())
return It->second.get();

Expected<std::unique_ptr<ObjectFile>> ObjOrErr = Loader();
if (!ObjOrErr) {
ObjectFileCache.emplace(Key, std::unique_ptr<ObjectFile>());
return ObjOrErr.takeError();
}

ObjectFile *Res = ObjOrErr->get();
auto NewEntry = ObjectFileCache.emplace(Key, std::move(*ObjOrErr));
auto CacheIter = BinaryForPath.find(PathForBinaryCache);
if (CacheIter != BinaryForPath.end())
CacheIter->second.pushEvictor(
[this, Iter = NewEntry.first]() { ObjectFileCache.erase(Iter); });
return Res;
}

Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(
StringRef ArchivePath, StringRef MemberName, StringRef ArchName) {
Expected<object::Binary *> BinOrErr = loadOrGetBinary(ArchivePath.str());
if (!BinOrErr)
return BinOrErr.takeError();
object::Binary *Bin = *BinOrErr;

object::Archive *Archive = dyn_cast_if_present<object::Archive>(Bin);
if (!Archive)
return createStringError(std::errc::invalid_argument,
"'%s' is not a valid archive",
ArchivePath.str().c_str());

Error Err = Error::success();
// On AIX, archives can contain multiple members with the same name but
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: it's quite straightforward to create an archive with multiple members of the same name, using llvm-ar q. I don't think this impacts things here specifically though.

// different types. We need to check all matches and find one that matches
// both name and architecture.
for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
Expected<StringRef> NameOrErr = Child.getName();
if (!NameOrErr)
continue;
if (*NameOrErr == MemberName) {
Expected<std::unique_ptr<object::Binary>> MemberOrErr =
Child.getAsBinary();
if (!MemberOrErr)
continue;

std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
Triple::ArchType ObjArch = Obj->makeTriple().getArch();
Triple RequestedTriple;
RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
if (ObjArch != RequestedTriple.getArch())
continue;

ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(),
ArchName.str()};
Expected<ObjectFile *> Res = findOrCacheObject(
CacheKey,
[O = std::unique_ptr<ObjectFile>(
Obj)]() mutable -> Expected<std::unique_ptr<ObjectFile>> {
return std::move(O);
},
ArchivePath.str());
Binary.release();
return Res;
}
}
ObjectFile *Res = ObjOrErr->get();
auto Pair = ObjectForUBPathAndArch.emplace(std::make_pair(Path, ArchName),
std::move(ObjOrErr.get()));
BinaryForPath.find(Path)->second.pushEvictor(
[this, Iter = Pair.first]() { ObjectForUBPathAndArch.erase(Iter); });
return Res;
}
if (Err)
return std::move(Err);
return createStringError(std::errc::invalid_argument,
"no matching member '%s' with arch '%s' in '%s'",
MemberName.str().c_str(), ArchName.str().c_str(),
ArchivePath.str().c_str());
}

Expected<ObjectFile *>
LLVMSymbolizer::getOrCreateObject(const std::string &Path,
const std::string &ArchName) {
// First check for archive(member) format - more efficient to check closing
// paren first.
if (!Path.empty() && Path.back() == ')') {
size_t OpenParen = Path.rfind('(', Path.size() - 1);
if (OpenParen != std::string::npos) {
StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
StringRef MemberName =
StringRef(Path).substr(OpenParen + 1, Path.size() - OpenParen - 2);
return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
}
}

Expected<object::Binary *> BinOrErr = loadOrGetBinary(Path);
if (!BinOrErr)
return BinOrErr.takeError();
object::Binary *Bin = *BinOrErr;

if (MachOUniversalBinary *UB = dyn_cast_or_null<MachOUniversalBinary>(Bin)) {
ArchiveCacheKey CacheKey{Path, "", ArchName};
return findOrCacheObject(
CacheKey,
[UB, ArchName]() -> Expected<std::unique_ptr<ObjectFile>> {
return UB->getMachOObjectForArch(ArchName);
},
Path);
}
if (Bin->isObject()) {
return cast<ObjectFile>(Bin);
Expand Down Expand Up @@ -648,7 +738,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {

auto I = Modules.find(ModuleName);
if (I != Modules.end()) {
recordAccess(BinaryForPath.find(BinaryName)->second);
auto BinIter = BinaryForPath.find(BinaryName);
if (BinIter != BinaryForPath.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: what's the purpose of the logic change and what test coverage is there for it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered.

Copy link
Contributor Author

@midhuncodes7 midhuncodes7 Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some iteration the BinaryForPath.find(BinaryName) fails and returns BinaryForPath.end() but the previous code doesn't check for it, so it ends up calling pushEvictor on invalid memory. This results in a crash, hence a check is added. The behaviour is not changed its just split into multiple steps having the same logic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the new code path that triggers the crash that you're talking about? In other words, why wasn't this a problem before and now is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a plain object file (t.o) the ModuleName is "t.o" and BinaryForPath also stores an entry keyed by "t.o". BinaryForPath.find(BinaryName) succeeds and so is ->second.pushEvictor(...).
For an archive member "(t.a(t.o))", loadOrGetBinary (and BinaryForPath.emplace) is called with the archive path "t.a" (not the "(t.o)" suffix).
But getOrCreateModuleInfo still uses ModuleName (which is the full "t.a(t.o)") as BinaryName when it later does BinaryForPath.find(BinaryName).
That find("t.a(t.o)") returns end() (no such key) and when ->second on that end() result — undefined behavior that manifests as a segfault during destructor call.

Updating BinaryForPath as "t.a(t.o)" will result in invalid path.
Updating BinaryName as "t.a" will result in breaking per-member resolution

Hope this helps

recordAccess(BinIter->second);
return I->second.get();
}

Expand Down Expand Up @@ -716,9 +808,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
createModuleInfo(Objects.first, std::move(Context), ModuleName);
if (ModuleOrErr) {
auto I = Modules.find(ModuleName);
BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
Modules.erase(I);
});
auto BinIter = BinaryForPath.find(BinaryName);
if (BinIter != BinaryForPath.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: what's the purpose of the logic change and what test coverage is there for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some iterations the BinaryForPath.find(BinaryName) fails and returns BinaryForPath.end() but the previous code doesn't check for it, so it ends up calling pushEvictor on invalid memory. This results in a crash, hence a check is added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, what is the new code path that causes the crash that you've discussed? I've looked at the code and I can't see it: BinaryForPath is populated within getOrCreateObjectPair, which is called earlier in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a plain object file (t.o) the ModuleName is "t.o" and BinaryForPath also stores an entry keyed by "t.o". BinaryForPath.find(BinaryName) succeeds and so is ->second.pushEvictor(...).
For an archive member "(t.a(t.o))", loadOrGetBinary (and BinaryForPath.emplace) is called with the archive path "t.a" (not the "(t.o)" suffix).
But getOrCreateModuleInfo still uses ModuleName (which is the full "t.a(t.o)") as BinaryName when it later does BinaryForPath.find(BinaryName).
That find("t.a(t.o)") returns end() (no such key) and when ->second on that end() result — undefined behavior that manifests as a segfault during destructor call.

Updating BinaryForPath as "t.a(t.o)" will result in invalid path.
Updating BinaryName as "t.a" will result in breaking per-member resolution

Hope this helps

BinIter->second.pushEvictor([this, I]() { Modules.erase(I); });
}
return ModuleOrErr;
}
Expand Down
Loading