-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[llvm-symbolizer] Recognize AIX big archive #150401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 5 commits
d98ed02
64639e1
f5a357e
305ef99
66d5d11
27b4f10
647f98e
629e7a5
8c71818
f0be9a8
bfa845e
9c14b38
f8aeb3e
0952fcf
6be8d1a
12dd6e5
68b886f
d0f0a1d
fe5a9c2
e2083eb
12bf16a
0595460
4d46767
315b98c
b37409b
30e4387
08f38ae
ca85990
844906c
74d4e23
183197a
c68b4d9
7407769
4da5ebd
b0f0705
86f2023
f034d90
c6d8b90
e4d6fed
f57d66a
c3acb24
136bde7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS | |
.. option:: --default-arch <arch> | ||
|
||
If a binary contains object files for multiple architectures (e.g. it is a | ||
Mach-O universal binary), symbolize the object file for a given architecture. | ||
You can also specify the architecture by writing ``binary_name:arch_name`` in | ||
the input (see example below). If the architecture is not specified in either | ||
way, the address will not be symbolized. Defaults to empty string. | ||
Mach-O universal binary or an AIX archive with architecture variants), | ||
symbolize the object file for a given architecture. You can also specify | ||
the architecture by writing ``binary_name:arch_name`` in the input (see | ||
example below). For AIX archives, the format ``archive.a(member.o):arch`` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The docs here still refer to AIX archives specifically, yet the behaviour change is generic. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects This behaviour is AIX specific, hence the documentation is more specific to AIX There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Behavior such as having same object name which vary in architecture getting added to archive is specific to AIX |
||
is also supported. If the architecture is not specified in either way, | ||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
the address will not be symbolized. Defaults to empty string. | ||
|
||
.. code-block:: console | ||
|
||
$ cat addr.txt | ||
/tmp/mach_universal_binary:i386 0x1f84 | ||
/tmp/mach_universal_binary:x86_64 0x100000f24 | ||
/tmp/archive.a(member.o):ppc 0x1000 | ||
/tmp/archive.a(member.o):ppc64 0x2000 | ||
midhuncodes7 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
$ llvm-symbolizer < addr.txt | ||
_main | ||
|
@@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS | |
_main | ||
/tmp/source_x86_64.cc:8 | ||
|
||
_foo | ||
/tmp/source_ppc.cc:12 | ||
|
||
_foo | ||
/tmp/source_ppc64.cc:12 | ||
|
||
.. option:: --dsym-hint <path/to/file.dSYM> | ||
|
||
If the debug info for a binary isn't present in the default location, look for | ||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -33,6 +33,7 @@ | |||||
#include "llvm/Support/FileSystem.h" | ||||||
#include "llvm/Support/MemoryBuffer.h" | ||||||
#include "llvm/Support/Path.h" | ||||||
#include "llvm/Object/Archive.h" | ||||||
#include <cassert> | ||||||
#include <cstring> | ||||||
|
||||||
|
@@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol, | |||||
|
||||||
void LLVMSymbolizer::flush() { | ||||||
ObjectForUBPathAndArch.clear(); | ||||||
ObjectForArchivePathAndArch.clear(); | ||||||
LRUBinaries.clear(); | ||||||
CacheSize = 0; | ||||||
BinaryForPath.clear(); | ||||||
|
@@ -321,7 +323,7 @@ bool checkFileCRC(StringRef Path, uint32_t CRCHash) { | |||||
|
||||||
bool getGNUDebuglinkContents(const ObjectFile *Obj, std::string &DebugName, | ||||||
uint32_t &CRCHash) { | ||||||
if (!Obj) | ||||||
if (!Obj || !isa<ObjectFile>(Obj)) | ||||||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
return false; | ||||||
for (const SectionRef &Section : Obj->sections()) { | ||||||
StringRef Name; | ||||||
|
@@ -557,19 +559,101 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path, | |||||
if (!DbgObj) | ||||||
DbgObj = Obj; | ||||||
ObjectPair Res = std::make_pair(Obj, DbgObj); | ||||||
std::string DbgObjPath = DbgObj->getFileName().str(); | ||||||
auto Pair = | ||||||
ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res); | ||||||
BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() { | ||||||
std::string DbgObjPath = DbgObj->getFileName().str(); | ||||||
auto BinIter = BinaryForPath.find(DbgObjPath); | ||||||
if (BinIter != BinaryForPath.end()) { | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could I get an explanation for why this behaviour has changed, please. Also, I assume you have test coverage for it? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not answered. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In some iterations, the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What do you mean by "in some iterations"? Is this actually related to your addition of big archive support? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As in the other comments, please explain in detail from the entry point what the code path is that could hit here without the binary being cached, please, because I don't see it. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. For a plain object file (t.o) the Updating Hope this helps There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. It's going to take me some time to go over this and determine if there's a better approach. I suspect there is, but I'm not sure yet what that is. |
||||||
BinIter->second.pushEvictor([this, I = Pair.first]() { | ||||||
ObjectPairForPathArch.erase(I); | ||||||
}); | ||||||
}); | ||||||
} | ||||||
return Res; | ||||||
} | ||||||
|
||||||
Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(StringRef ArchivePath, | ||||||
StringRef MemberName, | ||||||
const std::string &ArchName) { | ||||||
midhuncodes7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
Binary *Bin = nullptr; | ||||||
auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>()); | ||||||
if (!Pair.second) { | ||||||
Bin = Pair.first->second->getBinary(); | ||||||
recordAccess(Pair.first->second); | ||||||
} else { | ||||||
Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath); | ||||||
if (!ArchiveOrErr) { | ||||||
return ArchiveOrErr.takeError(); | ||||||
} | ||||||
midhuncodes7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
||||||
CachedBinary &CachedBin = Pair.first->second; | ||||||
CachedBin = std::move(ArchiveOrErr.get()); | ||||||
CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); }); | ||||||
LRUBinaries.push_back(CachedBin); | ||||||
CacheSize += CachedBin.size(); | ||||||
Bin = CachedBin->getBinary(); | ||||||
} | ||||||
|
||||||
if (!Bin || !isa<object::Archive>(Bin)) | ||||||
return errorCodeToError(object_error::invalid_file_type); | ||||||
jh7370 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
|
||||||
object::Archive *Archive = cast<object::Archive>(Bin); | ||||||
midhuncodes7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
Error Err = Error::success(); | ||||||
|
||||||
// On AIX, archives can contain multiple members with same name but different types | ||||||
// We need to check all matches and find one that matches both name and architecture | ||||||
midhuncodes7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) { | ||||||
Expected<StringRef> NameOrErr = Child.getName(); | ||||||
if (!NameOrErr) | ||||||
continue; | ||||||
if (*NameOrErr == llvm::sys::path::filename(MemberName)) { | ||||||
midhuncodes7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
Expected<std::unique_ptr<object::Binary>> MemberOrErr = Child.getAsBinary(); | ||||||
if (!MemberOrErr) | ||||||
continue; | ||||||
|
||||||
std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr); | ||||||
if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) { | ||||||
#if defined(_AIX) | ||||||
midhuncodes7 marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||||||
Triple::ArchType ObjArch = Obj->makeTriple().getArch(); | ||||||
Triple RequestedTriple; | ||||||
RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName)); | ||||||
if (ObjArch != RequestedTriple.getArch()) | ||||||
continue; | ||||||
#endif | ||||||
ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(), ArchName}; | ||||||
auto I = ObjectForArchivePathAndArch.find(CacheKey); | ||||||
if (I != ObjectForArchivePathAndArch.end()) | ||||||
return I->second.get(); | ||||||
|
||||||
auto CachedObj = std::unique_ptr<ObjectFile>(Obj); | ||||||
auto NewEntry = ObjectForArchivePathAndArch.emplace( | ||||||
CacheKey, std::move(CachedObj)); | ||||||
Binary.release(); | ||||||
BinaryForPath.find(ArchivePath.str())->second.pushEvictor( | ||||||
[this, Iter = NewEntry.first]() { ObjectForArchivePathAndArch.erase(Iter); }); | ||||||
return NewEntry.first->second.get(); | ||||||
} | ||||||
} | ||||||
} | ||||||
if (Err) | ||||||
return std::move(Err); | ||||||
return errorCodeToError(object_error::arch_not_found); | ||||||
} | ||||||
|
||||||
Expected<ObjectFile *> | ||||||
LLVMSymbolizer::getOrCreateObject(const std::string &Path, | ||||||
const std::string &ArchName) { | ||||||
Binary *Bin; | ||||||
// First check for archive(member) format - more efficient to check closing paren first | ||||||
|
// First check for archive(member) format - more efficient to check closing paren first | |
// First check for archive(member) format - more efficient to check closing paren first. |
Nit.
This behaviour change isn't guarded to Big Archive format as stated in the documentation. It looks to me like a UNIX archive would be impacted by this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping. The second part of this comment hasn't been addressed as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we intend to allow naming archive members in non-AIX contexts (as that may be helpful). The documentation will be made more generic like how it used to be before.
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't efficient at all, especially for the case where there is no closing paren (likely the common case), but even for the case where you have one. The only case you're interested in is if CloseParen
is the last character in Path
, so why not just do if (Path.back() == ')')
(but be careful you don't have an empty string)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not addressed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though the suggestion would efficiently check for ')', upon having an empty string it would need an explicit check for UB whereas current code returns npos safely.
For valid case, we still end up using rfind to get the matching '(' . Effectively we'd just reduce 2 * O(N) to 1 * O(N) and 1 * O(1).
The comment here mentioning about efficiency is not about code efficiency instead it is about checking for the closing paran first instead of opening paren or some other logic.
If you still insist on making this change, I'll go ahead and do it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is a comparison like if (!Path.empty())
any more complicated than if (CloseParen != std::string::npos)
?
Big O notation doesn't really tell the full story - you need to consider the typical case. For all inputs without ')', the entire string has to be scanned with the current behaviour, which will require a minimum of two checks per character (one to check if it matches and one to see if the end has been reached). This will be the most common case, so should be the case we give greatest weight to. For the cases where ')' is present but not the last character, you still have to query at least 2 characters (the last character and the preceding character, if it is ')'), plus associated end checks, then do an additional check (i.e. is it the last character). Even if ')' is the last character, rfind will still have to do 2 checks (the "are we at the end check", then the character comparison), plus 1 additional "is it the last character check". So, to summarise, we have the following number of comparisons in the 3 cases:
- No ')': 2 * + 1
- ')' is present, but not last: 2 * <relative position of ')'> + 1
- ')' is last: 3 (+ cost of looking up the '(')
My proposal has a maximum of 2 comparisons for ALL cases (not counting the additional rfind call needed if it the case we are interested in).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Updated the code as per suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment: what's the purpose of the logic change and what test coverage is there for it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not answered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In some iteration the BinaryForPath.find(BinaryName)
fails and returns BinaryForPath.end()
but the previous code doesn't check for it, so it ends up calling pushEvictor
on invalid memory. This results in a crash, hence a check is added. The behaviour is not changed its just split into multiple steps having the same logic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the new code path that triggers the crash that you're talking about? In other words, why wasn't this a problem before and now is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a plain object file (t.o) the ModuleName
is "t.o" and BinaryForPath
also stores an entry keyed by "t.o". BinaryForPath.find(BinaryName)
succeeds and so is ->second.pushEvictor(...)
.
For an archive member "(t.a(t.o))", loadOrGetBinary
(and BinaryForPath.emplace
) is called with the archive path "t.a" (not the "(t.o)" suffix).
But getOrCreateModuleInfo
still uses ModuleName
(which is the full "t.a(t.o)") as BinaryName
when it later does BinaryForPath.find(BinaryName)
.
That find("t.a(t.o)") returns end()
(no such key) and when ->second
on that end()
result — undefined behavior that manifests as a segfault during destructor call.
Updating BinaryForPath
as "t.a(t.o)" will result in invalid path.
Updating BinaryName
as "t.a" will result in breaking per-member resolution
Hope this helps
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,119 @@ | ||
--- !XCOFF | ||
FileHeader: | ||
MagicNumber: 0x1DF | ||
NumberOfSections: 2 | ||
CreationTime: 0 | ||
OffsetToSymbolTable: 0xA0 | ||
EntriesInSymbolTable: 11 | ||
AuxiliaryHeaderSize: 0 | ||
Flags: 0x0 | ||
Sections: | ||
- Name: .text | ||
Address: 0x0 | ||
Size: 0x1C | ||
FileOffsetToData: 0x64 | ||
FileOffsetToRelocations: 0x0 | ||
FileOffsetToLineNumbers: 0x0 | ||
NumberOfRelocations: 0x0 | ||
NumberOfLineNumbers: 0x0 | ||
Flags: [ STYP_TEXT ] | ||
SectionData: 4E800020000000000009204000000001000000040003666F6F000000 | ||
- Name: .data | ||
Address: 0x1C | ||
Size: 0xC | ||
FileOffsetToData: 0x80 | ||
FileOffsetToRelocations: 0x8C | ||
FileOffsetToLineNumbers: 0x0 | ||
NumberOfRelocations: 0x2 | ||
NumberOfLineNumbers: 0x0 | ||
Flags: [ STYP_DATA ] | ||
SectionData: '000000000000002800000000' | ||
Relocations: | ||
- Address: 0x1C | ||
Symbol: 0x5 | ||
Info: 0x1F | ||
Type: 0x0 | ||
- Address: 0x20 | ||
Symbol: 0x9 | ||
Info: 0x1F | ||
Type: 0x0 | ||
Symbols: | ||
- Name: .file | ||
Value: 0x0 | ||
Section: N_DEBUG | ||
Type: 0x18 | ||
StorageClass: C_FILE | ||
NumberOfAuxEntries: 2 | ||
AuxEntries: | ||
- Type: AUX_FILE | ||
FileNameOrString: foo.c | ||
FileStringType: XFT_FN | ||
- Type: AUX_FILE | ||
FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)' | ||
FileStringType: XFT_CV | ||
- Name: '' | ||
Value: 0x0 | ||
Section: .text | ||
Type: 0x0 | ||
StorageClass: C_HIDEXT | ||
NumberOfAuxEntries: 1 | ||
AuxEntries: | ||
- Type: AUX_CSECT | ||
ParameterHashIndex: 0 | ||
TypeChkSectNum: 0 | ||
SymbolType: XTY_SD | ||
SymbolAlignment: 5 | ||
StorageMappingClass: XMC_PR | ||
SectionOrLength: 25 | ||
StabInfoIndex: 0 | ||
StabSectNum: 0 | ||
- Name: .foo | ||
Value: 0x0 | ||
Section: .text | ||
Type: 0x0 | ||
StorageClass: C_EXT | ||
NumberOfAuxEntries: 1 | ||
AuxEntries: | ||
- Type: AUX_CSECT | ||
ParameterHashIndex: 0 | ||
TypeChkSectNum: 0 | ||
SymbolType: XTY_LD | ||
SymbolAlignment: 0 | ||
StorageMappingClass: XMC_PR | ||
SectionOrLength: 3 | ||
StabInfoIndex: 0 | ||
StabSectNum: 0 | ||
- Name: foo | ||
Value: 0x1C | ||
Section: .data | ||
Type: 0x0 | ||
StorageClass: C_EXT | ||
NumberOfAuxEntries: 1 | ||
AuxEntries: | ||
- Type: AUX_CSECT | ||
ParameterHashIndex: 0 | ||
TypeChkSectNum: 0 | ||
SymbolType: XTY_SD | ||
SymbolAlignment: 2 | ||
StorageMappingClass: XMC_DS | ||
SectionOrLength: 12 | ||
StabInfoIndex: 0 | ||
StabSectNum: 0 | ||
- Name: TOC | ||
Value: 0x28 | ||
Section: .data | ||
Type: 0x0 | ||
StorageClass: C_HIDEXT | ||
NumberOfAuxEntries: 1 | ||
AuxEntries: | ||
- Type: AUX_CSECT | ||
ParameterHashIndex: 0 | ||
TypeChkSectNum: 0 | ||
SymbolType: XTY_SD | ||
SymbolAlignment: 2 | ||
StorageMappingClass: XMC_TC0 | ||
SectionOrLength: 0 | ||
StabInfoIndex: 0 | ||
StabSectNum: 0 | ||
StringTable: {} | ||
... |
Uh oh!
There was an error while loading. Please reload this page.