Skip to content
Open
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
d98ed02
big archive recognition implementation
midhuncodes7 Jul 21, 2025
64639e1
ELF test not supported on AIX
midhuncodes7 Jul 22, 2025
f5a357e
update yaml script
Jul 24, 2025
305ef99
target specific changes
midhuncodes7 Jul 24, 2025
66d5d11
Merge branch 'midhun7/big-archive-recognition' of github.com:midhunco…
midhuncodes7 Jul 24, 2025
27b4f10
Review comments addressed
midhuncodes7 Jul 30, 2025
647f98e
review comments addressed
midhuncodes7 Aug 1, 2025
629e7a5
review comments addressed
midhuncodes7 Aug 1, 2025
8c71818
Merge branch 'llvm:main' into midhun7/big-archive-recognition
midhuncodes7 Aug 1, 2025
f0be9a8
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 1, 2025
bfa845e
Merge branch 'midhun7/big-archive-recognition' of github.com:midhunco…
midhuncodes7 Aug 1, 2025
9c14b38
format correction
midhuncodes7 Aug 1, 2025
f8aeb3e
added test support
midhuncodes7 Aug 1, 2025
0952fcf
refactor of code to remove duplicates
midhuncodes7 Aug 12, 2025
6be8d1a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 12, 2025
12dd6e5
code formatting
midhuncodes7 Aug 12, 2025
68b886f
code formatting
midhuncodes7 Aug 12, 2025
d0f0a1d
test fail fix
midhuncodes7 Aug 12, 2025
fe5a9c2
code formatting
midhuncodes7 Aug 12, 2025
e2083eb
make tests available on other platforms
midhuncodes7 Aug 14, 2025
12bf16a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 14, 2025
0595460
test changes for windows
midhuncodes7 Aug 18, 2025
4d46767
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 18, 2025
315b98c
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 19, 2025
b37409b
fix review comments
midhuncodes7 Aug 21, 2025
30e4387
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 21, 2025
08f38ae
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Aug 29, 2025
ca85990
review comments addressed
midhuncodes7 Aug 29, 2025
844906c
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 6, 2025
74d4e23
code format
midhuncodes7 Sep 6, 2025
183197a
code format
midhuncodes7 Sep 8, 2025
c68b4d9
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 8, 2025
7407769
review comments fix
midhuncodes7 Sep 8, 2025
4da5ebd
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 8, 2025
b0f0705
added tests
midhuncodes7 Sep 15, 2025
86f2023
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 15, 2025
f034d90
review comments fix
midhuncodes7 Sep 15, 2025
c6d8b90
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 19, 2025
e4d6fed
remove sys call for path
midhuncodes7 Sep 19, 2025
f57d66a
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 25, 2025
c3acb24
review comment fix
midhuncodes7 Sep 29, 2025
136bde7
Merge remote-tracking branch 'origin/main' into midhun7/big-archive-r…
midhuncodes7 Sep 29, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 14 additions & 4 deletions llvm/docs/CommandGuide/llvm-symbolizer.rst
Original file line number Diff line number Diff line change
Expand Up @@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
.. option:: --default-arch <arch>

If a binary contains object files for multiple architectures (e.g. it is a
Mach-O universal binary), symbolize the object file for a given architecture.
You can also specify the architecture by writing ``binary_name:arch_name`` in
the input (see example below). If the architecture is not specified in either
way, the address will not be symbolized. Defaults to empty string.
Mach-O universal binary or an AIX archive with architecture variants),
symbolize the object file for a given architecture. You can also specify
the architecture by writing ``binary_name:arch_name`` in the input (see
example below). For AIX archives, the format ``archive.a(member.o):arch``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs here still refer to AIX archives specifically, yet the behaviour change is generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects

This behaviour is AIX specific, hence the documentation is more specific to AIX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior such as having same object name which vary in architecture getting added to archive is specific to AIX

is also supported. If the architecture is not specified in either way,
the address will not be symbolized. Defaults to empty string.

.. code-block:: console

$ cat addr.txt
/tmp/mach_universal_binary:i386 0x1f84
/tmp/mach_universal_binary:x86_64 0x100000f24
/tmp/big-archive.a(member.o):ppc 0x1000
/tmp/big-archive.a(member.o):ppc64 0x2000

$ llvm-symbolizer < addr.txt
_main
Expand All @@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
_main
/tmp/source_x86_64.cc:8

_foo
/tmp/source_ppc.cc:12

_foo
/tmp/source_ppc64.cc:12

.. option:: --dsym-hint <path/to/file.dSYM>

If the debug info for a binary isn't present in the default location, look for
Expand Down
25 changes: 23 additions & 2 deletions llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
Original file line number Diff line number Diff line change
Expand Up @@ -199,8 +199,14 @@ class LLVMSymbolizer {
/// Return a pointer to object file at specified path, for a specified
/// architecture (e.g. if path refers to a Mach-O universal binary, only one
/// object file from it will be returned).
Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
const std::string &ArchName);
Expected<ObjectFile *> getOrCreateObject(const std::string &InputPath,
const std::string &DefaultArchName);

/// Return a pointer to object file at specified path, for a specified
/// architecture that is present inside an archive file
Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
StringRef MemberName,
StringRef ArchName);

/// Update the LRU cache order when a binary is accessed.
void recordAccess(CachedBinary &Bin);
Expand All @@ -226,6 +232,21 @@ class LLVMSymbolizer {
std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
ObjectForUBPathAndArch;

struct ArchiveCacheKey {
std::string ArchivePath; // Storage for StringRef
std::string MemberName; // Storage for StringRef
std::string ArchName; // Storage for StringRef

// Required for map comparison
bool operator<(const ArchiveCacheKey &Other) const {
return std::tie(ArchivePath, MemberName, ArchName) <
std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
}
};

std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>>
ObjectForArchivePathAndArch;

Options Opts;

std::unique_ptr<BuildIDFetcher> BIDFetcher;
Expand Down
106 changes: 97 additions & 9 deletions llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
#include "llvm/DebugInfo/PDB/PDBContext.h"
#include "llvm/DebugInfo/Symbolize/SymbolizableObjectFile.h"
#include "llvm/Demangle/Demangle.h"
#include "llvm/Object/Archive.h"
#include "llvm/Object/BuildID.h"
#include "llvm/Object/COFF.h"
#include "llvm/Object/ELFObjectFile.h"
Expand Down Expand Up @@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,

void LLVMSymbolizer::flush() {
ObjectForUBPathAndArch.clear();
ObjectForArchivePathAndArch.clear();
LRUBinaries.clear();
CacheSize = 0;
BinaryForPath.clear();
Expand Down Expand Up @@ -557,19 +559,103 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
if (!DbgObj)
DbgObj = Obj;
ObjectPair Res = std::make_pair(Obj, DbgObj);
std::string DbgObjPath = DbgObj->getFileName().str();
auto Pair =
ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
ObjectPairForPathArch.erase(I);
});
std::string DbgObjPath = DbgObj->getFileName().str();
auto BinIter = BinaryForPath.find(DbgObjPath);
if (BinIter != BinaryForPath.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could I get an explanation for why this behaviour has changed, please. Also, I assume you have test coverage for it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some iterations, the BinaryForPath.find(BinaryName) fails and returns BinaryForPath.end() but the previous code doesn't check for it, so it ends up calling pushEvictor on invalid memory. This results in a crash, hence a check is added. The behaviour is not changed its just split into multiple steps having the same logic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "in some iterations"? Is this actually related to your addition of big archive support?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BinaryForPath.find(BinaryName) can return BinaryForPath.end() when the Binary is not yet cached.
On such instances, pushEvictor might get called on an invalid memory. To prevent that the check is introduced.
This issue started occuring after big archive format implementation - t.a(t.o). It did not occur when we tried to symbolize objects directly(t.o).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other comments, please explain in detail from the entry point what the code path is that could hit here without the binary being cached, please, because I don't see it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a plain object file (t.o) the ModuleName is "t.o" and BinaryForPath also stores an entry keyed by "t.o". BinaryForPath.find(BinaryName) succeeds and so is ->second.pushEvictor(...).
For an archive member "(t.a(t.o))", loadOrGetBinary (and BinaryForPath.emplace) is called with the archive path "t.a" (not the "(t.o)" suffix).
But getOrCreateModuleInfo still uses ModuleName (which is the full "t.a(t.o)") as BinaryName when it later does BinaryForPath.find(BinaryName).
That find("t.a(t.o)") returns end() (no such key) and when ->second on that end() result — undefined behavior that manifests as a segfault during destructor call.

Updating BinaryForPath as "t.a(t.o)" will result in invalid path.
Updating BinaryName as "t.a" will result in breaking per-member resolution

Hope this helps

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. It's going to take me some time to go over this and determine if there's a better approach. I suspect there is, but I'm not sure yet what that is.

BinIter->second.pushEvictor(
[this, I = Pair.first]() { ObjectPairForPathArch.erase(I); });
}
return Res;
}

Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(
StringRef ArchivePath, StringRef MemberName, StringRef ArchName) {
Binary *Bin = nullptr;
auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>());
if (!Pair.second) {
Bin = Pair.first->second->getBinary();
recordAccess(Pair.first->second);
} else {
Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath);
if (!ArchiveOrErr)
return ArchiveOrErr.takeError();

CachedBinary &CachedBin = Pair.first->second;
CachedBin = std::move(ArchiveOrErr.get());
CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
LRUBinaries.push_back(CachedBin);
CacheSize += CachedBin.size();
Bin = CachedBin->getBinary();
}

object::Archive *Archive = dyn_cast_if_present<object::Archive>(Bin);
if (!Archive)
return errorCodeToError(object_error::invalid_file_type);

Error Err = Error::success();

// On AIX, archives can contain multiple members with same name but different
// types. We need to check all matches and find one that matches both name and
// architecture.
for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
Expected<StringRef> NameOrErr = Child.getName();
if (!NameOrErr)
continue;
if (*NameOrErr == sys::path::filename(MemberName)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the sys::path::filename call?

Copy link
Contributor Author

@midhuncodes7 midhuncodes7 Aug 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MemberName may sometimes contain the entire path to the member. In those cases this condition will fail as NameOrErr will contain only the member name. In order to deal with such cases as well, sys::path::filename call is used

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm not mistaken, MemberName will be a full path when using thin archives. Otherwise, it should be just the file name. Can those be used with big archives?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're trying to have a logic that is common for any archives

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't actually answer my question. Do big archives contain full paths?

Putting it another way, what test would fail if you didn't have this call? If you haven't currently got a test for it, you should add one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test where:
// RUN: llvm-symbolizer --default-arch=ppc --obj="%t.a(%t.o)" 0x0 | FileCheck %s --check-prefix=CHECK-32 would fail as %t.o would have the entire path to the object file. Hence this call is needed for such inputs.
Added tests accordingly.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, to me, isn't valid usage, since %t.o is a full path. It doesn't make any more sense than trying to extract an archive member by name, but using a full path.

This logic implies that the following two cases would result in the same member being used:

llvm-symbolizer ... foo.a(/foo/bar/wobble.o)
llvm-symbolizer ... foo.a(/flob/flab/wobble.o)

Yet the object file might even have come from a completely unrelated path, e.g.

llvm-symbolizer ... foo.a(/baz/wibble/wobble.o)

This doesn't make sense to me. So, why do you want to do it?

Expected<std::unique_ptr<object::Binary>> MemberOrErr =
Child.getAsBinary();
if (!MemberOrErr)
continue;

std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
Triple::ArchType ObjArch = Obj->makeTriple().getArch();
Triple RequestedTriple;
RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
if (ObjArch != RequestedTriple.getArch())
continue;
ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(),
ArchName.str()};
auto I = ObjectForArchivePathAndArch.find(CacheKey);
if (I != ObjectForArchivePathAndArch.end())
return I->second.get();

auto CachedObj = std::unique_ptr<ObjectFile>(Obj);
auto NewEntry =
ObjectForArchivePathAndArch.emplace(CacheKey, std::move(CachedObj));
Binary.release();
BinaryForPath.find(ArchivePath.str())
->second.pushEvictor([this, Iter = NewEntry.first]() {
ObjectForArchivePathAndArch.erase(Iter);
});
return NewEntry.first->second.get();
}
}
}
if (Err)
return std::move(Err);
return errorCodeToError(object_error::arch_not_found);
}

Expected<ObjectFile *>
LLVMSymbolizer::getOrCreateObject(const std::string &Path,
const std::string &ArchName) {
Binary *Bin;
// First check for archive(member) format - more efficient to check closing
// paren first.
size_t CloseParen = Path.rfind(')');
if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't efficient at all, especially for the case where there is no closing paren (likely the common case), but even for the case where you have one. The only case you're interested in is if CloseParen is the last character in Path, so why not just do if (Path.back() == ')') (but be careful you don't have an empty string)?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though the suggestion would efficiently check for ')', upon having an empty string it would need an explicit check for UB whereas current code returns npos safely.
For valid case, we still end up using rfind to get the matching '(' . Effectively we'd just reduce 2 * O(N) to 1 * O(N) and 1 * O(1).

The comment here mentioning about efficiency is not about code efficiency instead it is about checking for the closing paran first instead of opening paren or some other logic.

If you still insist on making this change, I'll go ahead and do it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is a comparison like if (!Path.empty()) any more complicated than if (CloseParen != std::string::npos)?

Big O notation doesn't really tell the full story - you need to consider the typical case. For all inputs without ')', the entire string has to be scanned with the current behaviour, which will require a minimum of two checks per character (one to check if it matches and one to see if the end has been reached). This will be the most common case, so should be the case we give greatest weight to. For the cases where ')' is present but not the last character, you still have to query at least 2 characters (the last character and the preceding character, if it is ')'), plus associated end checks, then do an additional check (i.e. is it the last character). Even if ')' is the last character, rfind will still have to do 2 checks (the "are we at the end check", then the character comparison), plus 1 additional "is it the last character check". So, to summarise, we have the following number of comparisons in the 3 cases:

  1. No ')': 2 * + 1
  2. ')' is present, but not last: 2 * <relative position of ')'> + 1
  3. ')' is last: 3 (+ cost of looking up the '(')

My proposal has a maximum of 2 comparisons for ALL cases (not counting the additional rfind call needed if it the case we are interested in).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Updated the code as per suggestion.

size_t OpenParen = Path.rfind('(', CloseParen);
if (OpenParen != std::string::npos) {
StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
StringRef MemberName =
StringRef(Path).substr(OpenParen + 1, CloseParen - OpenParen - 1);
return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
}
}

Binary *Bin = nullptr;
auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
if (!Pair.second) {
Bin = Pair.first->second->getBinary();
Expand Down Expand Up @@ -648,7 +734,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {

auto I = Modules.find(ModuleName);
if (I != Modules.end()) {
recordAccess(BinaryForPath.find(BinaryName)->second);
auto BinIter = BinaryForPath.find(BinaryName);
if (BinIter != BinaryForPath.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: what's the purpose of the logic change and what test coverage is there for it?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered.

Copy link
Contributor Author

@midhuncodes7 midhuncodes7 Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some iteration the BinaryForPath.find(BinaryName) fails and returns BinaryForPath.end() but the previous code doesn't check for it, so it ends up calling pushEvictor on invalid memory. This results in a crash, hence a check is added. The behaviour is not changed its just split into multiple steps having the same logic

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the new code path that triggers the crash that you're talking about? In other words, why wasn't this a problem before and now is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a plain object file (t.o) the ModuleName is "t.o" and BinaryForPath also stores an entry keyed by "t.o". BinaryForPath.find(BinaryName) succeeds and so is ->second.pushEvictor(...).
For an archive member "(t.a(t.o))", loadOrGetBinary (and BinaryForPath.emplace) is called with the archive path "t.a" (not the "(t.o)" suffix).
But getOrCreateModuleInfo still uses ModuleName (which is the full "t.a(t.o)") as BinaryName when it later does BinaryForPath.find(BinaryName).
That find("t.a(t.o)") returns end() (no such key) and when ->second on that end() result — undefined behavior that manifests as a segfault during destructor call.

Updating BinaryForPath as "t.a(t.o)" will result in invalid path.
Updating BinaryName as "t.a" will result in breaking per-member resolution

Hope this helps

recordAccess(BinIter->second);
return I->second.get();
}

Expand Down Expand Up @@ -716,9 +804,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
createModuleInfo(Objects.first, std::move(Context), ModuleName);
if (ModuleOrErr) {
auto I = Modules.find(ModuleName);
BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
Modules.erase(I);
});
auto BinIter = BinaryForPath.find(BinaryName);
if (BinIter != BinaryForPath.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment: what's the purpose of the logic change and what test coverage is there for it?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some iterations the BinaryForPath.find(BinaryName) fails and returns BinaryForPath.end() but the previous code doesn't check for it, so it ends up calling pushEvictor on invalid memory. This results in a crash, hence a check is added.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, what is the new code path that causes the crash that you've discussed? I've looked at the code and I can't see it: BinaryForPath is populated within getOrCreateObjectPair, which is called earlier in this function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a plain object file (t.o) the ModuleName is "t.o" and BinaryForPath also stores an entry keyed by "t.o". BinaryForPath.find(BinaryName) succeeds and so is ->second.pushEvictor(...).
For an archive member "(t.a(t.o))", loadOrGetBinary (and BinaryForPath.emplace) is called with the archive path "t.a" (not the "(t.o)" suffix).
But getOrCreateModuleInfo still uses ModuleName (which is the full "t.a(t.o)") as BinaryName when it later does BinaryForPath.find(BinaryName).
That find("t.a(t.o)") returns end() (no such key) and when ->second on that end() result — undefined behavior that manifests as a segfault during destructor call.

Updating BinaryForPath as "t.a(t.o)" will result in invalid path.
Updating BinaryName as "t.a" will result in breaking per-member resolution

Hope this helps

BinIter->second.pushEvictor([this, I]() { Modules.erase(I); });
}
return ModuleOrErr;
}
Expand Down
119 changes: 119 additions & 0 deletions llvm/test/DebugInfo/Inputs/big-archive-32.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,119 @@
--- !XCOFF
FileHeader:
MagicNumber: 0x1DF
NumberOfSections: 2
CreationTime: 0
OffsetToSymbolTable: 0xA0
EntriesInSymbolTable: 11
AuxiliaryHeaderSize: 0
Flags: 0x0
Sections:
- Name: .text
Address: 0x0
Size: 0x1C
FileOffsetToData: 0x64
FileOffsetToRelocations: 0x0
FileOffsetToLineNumbers: 0x0
NumberOfRelocations: 0x0
NumberOfLineNumbers: 0x0
Flags: [ STYP_TEXT ]
SectionData: 4E800020000000000009204000000001000000040003666F6F000000
- Name: .data
Address: 0x1C
Size: 0xC
FileOffsetToData: 0x80
FileOffsetToRelocations: 0x8C
FileOffsetToLineNumbers: 0x0
NumberOfRelocations: 0x2
NumberOfLineNumbers: 0x0
Flags: [ STYP_DATA ]
SectionData: '000000000000002800000000'
Relocations:
- Address: 0x1C
Symbol: 0x5
Info: 0x1F
Type: 0x0
- Address: 0x20
Symbol: 0x9
Info: 0x1F
Type: 0x0
Symbols:
- Name: .file
Value: 0x0
Section: N_DEBUG
Type: 0x18
StorageClass: C_FILE
NumberOfAuxEntries: 2
AuxEntries:
- Type: AUX_FILE
FileNameOrString: foo.c
FileStringType: XFT_FN
- Type: AUX_FILE
FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)'
FileStringType: XFT_CV
- Name: ''
Value: 0x0
Section: .text
Type: 0x0
StorageClass: C_HIDEXT
NumberOfAuxEntries: 1
AuxEntries:
- Type: AUX_CSECT
ParameterHashIndex: 0
TypeChkSectNum: 0
SymbolType: XTY_SD
SymbolAlignment: 5
StorageMappingClass: XMC_PR
SectionOrLength: 25
StabInfoIndex: 0
StabSectNum: 0
- Name: .foo
Value: 0x0
Section: .text
Type: 0x0
StorageClass: C_EXT
NumberOfAuxEntries: 1
AuxEntries:
- Type: AUX_CSECT
ParameterHashIndex: 0
TypeChkSectNum: 0
SymbolType: XTY_LD
SymbolAlignment: 0
StorageMappingClass: XMC_PR
SectionOrLength: 3
StabInfoIndex: 0
StabSectNum: 0
- Name: foo
Value: 0x1C
Section: .data
Type: 0x0
StorageClass: C_EXT
NumberOfAuxEntries: 1
AuxEntries:
- Type: AUX_CSECT
ParameterHashIndex: 0
TypeChkSectNum: 0
SymbolType: XTY_SD
SymbolAlignment: 2
StorageMappingClass: XMC_DS
SectionOrLength: 12
StabInfoIndex: 0
StabSectNum: 0
- Name: TOC
Value: 0x28
Section: .data
Type: 0x0
StorageClass: C_HIDEXT
NumberOfAuxEntries: 1
AuxEntries:
- Type: AUX_CSECT
ParameterHashIndex: 0
TypeChkSectNum: 0
SymbolType: XTY_SD
SymbolAlignment: 2
StorageMappingClass: XMC_TC0
SectionOrLength: 0
StabInfoIndex: 0
StabSectNum: 0
StringTable: {}
...
Loading