Skip to content

Conversation

@midhuncodes7
Copy link
Contributor

This PR implements big archive recognition by the symbolizer.
The archive input format should be in archive.a(member.o) format

@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Midhunesh (midhuncodes7)

Changes

This PR implements big archive recognition by the symbolizer.
The archive input format should be in archive.a(member.o) format


Patch is 26.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150401.diff

10 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+14-4)
  • (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+26)
  • (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+95-7)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-32.yaml (+119)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-64.yaml (+115)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml (+68)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml (+68)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-elf.test (+25)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-xcoff.test (+27)
  • (modified) llvm/tools/llvm-symbolizer/Opts.td (+3-3)
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b2470a83e..8f3a132139fe9 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
 .. option:: --default-arch <arch>
 
   If a binary contains object files for multiple architectures (e.g. it is a
-  Mach-O universal binary), symbolize the object file for a given architecture.
-  You can also specify the architecture by writing ``binary_name:arch_name`` in
-  the input (see example below). If the architecture is not specified in either
-  way, the address will not be symbolized. Defaults to empty string.
+  Mach-O universal binary or an AIX archive with architecture variants),
+  symbolize the object file for a given architecture. You can also specify
+  the architecture by writing ``binary_name:arch_name`` in the input (see
+  example below). For AIX archives, the format ``archive.a(member.o):arch``
+  is also supported. If the architecture is not specified in either way,
+  the address will not be symbolized. Defaults to empty string.
 
   .. code-block:: console
 
     $ cat addr.txt
     /tmp/mach_universal_binary:i386 0x1f84
     /tmp/mach_universal_binary:x86_64 0x100000f24
+    /tmp/archive.a(member.o):ppc 0x1000
+    /tmp/archive.a(member.o):ppc64 0x2000
 
     $ llvm-symbolizer < addr.txt
     _main
@@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
     _main
     /tmp/source_x86_64.cc:8
 
+    _foo
+    /tmp/source_ppc.cc:12
+    
+    _foo
+    /tmp/source_ppc64.cc:12
+
 .. option:: --dsym-hint <path/to/file.dSYM>
 
   If the debug info for a binary isn't present in the default location, look for
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index fb8f3d8af6b1b..5144085f3e23c 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -29,6 +29,12 @@
 #include <utility>
 #include <vector>
 
+#if defined(_AIX)
+#  define SYMBOLIZE_AIX 1
+#else
+#  define SYMBOLIZE_AIX 0
+#endif
+
 namespace llvm {
 namespace object {
 class ELFObjectFileBase;
@@ -202,6 +208,12 @@ class LLVMSymbolizer {
   Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
                                            const std::string &ArchName);
 
+  /// Return a pointer to object file at specified path, for a specified
+  /// architecture that is present inside an archive file
+  Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
+                                                      StringRef MemberName,
+                                                      const std::string &ArchName);   
+
   /// Update the LRU cache order when a binary is accessed.
   void recordAccess(CachedBinary &Bin);
 
@@ -226,6 +238,20 @@ class LLVMSymbolizer {
   std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
       ObjectForUBPathAndArch;
 
+  struct ArchiveCacheKey {
+    std::string ArchivePath;  // Storage for StringRef
+    std::string MemberName;   // Storage for StringRef
+    std::string ArchName;     // Storage for StringRef
+
+    // Required for map comparison
+    bool operator<(const ArchiveCacheKey &Other) const {
+      return std::tie(ArchivePath, MemberName, ArchName) < 
+             std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
+    }
+  };
+
+  std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>> ObjectForArchivePathAndArch;
+  
   Options Opts;
 
   std::unique_ptr<BuildIDFetcher> BIDFetcher;
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 56527719da51f..6dddc3a709239 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Object/Archive.h"
 #include <cassert>
 #include <cstring>
 
@@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
 
 void LLVMSymbolizer::flush() {
   ObjectForUBPathAndArch.clear();
+  ObjectForArchivePathAndArch.clear();
   LRUBinaries.clear();
   CacheSize = 0;
   BinaryForPath.clear();
@@ -321,7 +323,7 @@ bool checkFileCRC(StringRef Path, uint32_t CRCHash) {
 
 bool getGNUDebuglinkContents(const ObjectFile *Obj, std::string &DebugName,
                              uint32_t &CRCHash) {
-  if (!Obj)
+  if (!Obj || !isa<ObjectFile>(Obj))
     return false;
   for (const SectionRef &Section : Obj->sections()) {
     StringRef Name;
@@ -557,19 +559,101 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
   if (!DbgObj)
     DbgObj = Obj;
   ObjectPair Res = std::make_pair(Obj, DbgObj);
-  std::string DbgObjPath = DbgObj->getFileName().str();
   auto Pair =
       ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
-  BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
+  std::string DbgObjPath = DbgObj->getFileName().str();
+  auto BinIter = BinaryForPath.find(DbgObjPath);
+  if (BinIter != BinaryForPath.end()) {
+    BinIter->second.pushEvictor([this, I = Pair.first]() {
     ObjectPairForPathArch.erase(I);
-  });
+    });
+  }
   return Res;
 }
 
+Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(StringRef ArchivePath, 
+                                                                    StringRef MemberName, 
+                                                                    const std::string &ArchName) {
+  Binary *Bin = nullptr;
+  auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>());
+  if (!Pair.second) {
+    Bin = Pair.first->second->getBinary();
+    recordAccess(Pair.first->second);
+  } else {
+    Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath);
+    if (!ArchiveOrErr) {
+      return ArchiveOrErr.takeError();
+    }
+
+    CachedBinary &CachedBin = Pair.first->second;
+    CachedBin = std::move(ArchiveOrErr.get());
+    CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
+    LRUBinaries.push_back(CachedBin);
+    CacheSize += CachedBin.size();
+    Bin = CachedBin->getBinary();
+  }
+
+  if (!Bin || !isa<object::Archive>(Bin))
+    return errorCodeToError(object_error::invalid_file_type);
+
+  object::Archive *Archive = cast<object::Archive>(Bin);
+  Error Err = Error::success();
+  
+  // On AIX, archives can contain multiple members with same name but different types
+  // We need to check all matches and find one that matches both name and architecture
+  for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
+    Expected<StringRef> NameOrErr = Child.getName();
+    if (!NameOrErr)
+      continue; 
+    if (*NameOrErr == llvm::sys::path::filename(MemberName)) {
+      Expected<std::unique_ptr<object::Binary>> MemberOrErr = Child.getAsBinary();
+      if (!MemberOrErr)
+        continue; 
+      
+      std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
+      if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
+#if defined(_AIX)
+        Triple::ArchType ObjArch = Obj->makeTriple().getArch();
+        Triple RequestedTriple;
+        RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
+        if (ObjArch != RequestedTriple.getArch())
+          continue;
+#endif
+        ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(), ArchName};
+        auto I = ObjectForArchivePathAndArch.find(CacheKey);
+        if (I != ObjectForArchivePathAndArch.end())
+          return I->second.get();
+
+        auto CachedObj = std::unique_ptr<ObjectFile>(Obj);
+        auto NewEntry = ObjectForArchivePathAndArch.emplace(
+            CacheKey, std::move(CachedObj));
+        Binary.release();
+        BinaryForPath.find(ArchivePath.str())->second.pushEvictor(
+            [this, Iter = NewEntry.first]() { ObjectForArchivePathAndArch.erase(Iter); });
+        return NewEntry.first->second.get();
+      }
+    }
+  }
+  if (Err)
+    return std::move(Err);
+  return errorCodeToError(object_error::arch_not_found);
+}
+
 Expected<ObjectFile *>
 LLVMSymbolizer::getOrCreateObject(const std::string &Path,
                                   const std::string &ArchName) {
-  Binary *Bin;
+  // First check for archive(member) format - more efficient to check closing paren first
+  size_t CloseParen = Path.rfind(')');
+  if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
+    size_t OpenParen = Path.rfind('(', CloseParen);
+    if (OpenParen != std::string::npos) {
+      StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
+      StringRef MemberName = StringRef(Path).substr(OpenParen + 1, CloseParen - OpenParen - 1);
+      return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
+    }
+  }
+
+  Binary *Bin = nullptr;
   auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
   if (!Pair.second) {
     Bin = Pair.first->second->getBinary();
@@ -648,7 +732,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
 
   auto I = Modules.find(ModuleName);
   if (I != Modules.end()) {
-    recordAccess(BinaryForPath.find(BinaryName)->second);
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end())
+      recordAccess(BinIter->second);
     return I->second.get();
   }
 
@@ -716,7 +802,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
       createModuleInfo(Objects.first, std::move(Context), ModuleName);
   if (ModuleOrErr) {
     auto I = Modules.find(ModuleName);
-    BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end()) 
+      BinIter->second.pushEvictor([this, I]() {
       Modules.erase(I);
     });
   }
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-32.yaml b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
new file mode 100644
index 0000000000000..2080607a1a88c
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
@@ -0,0 +1,119 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1DF
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xA0
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0x64
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x1C
+    Size:            0xC
+    FileOffsetToData: 0x80
+    FileOffsetToRelocations: 0x8C
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000002800000000'
+    Relocations:
+      - Address:         0x1C
+        Symbol:          0x5
+        Info:            0x1F
+        Type:            0x0
+      - Address:         0x20
+        Symbol:          0x9
+        Info:            0x1F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 25
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 3
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            foo
+    Value:           0x1C
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_DS
+        SectionOrLength: 12
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            TOC
+    Value:           0x28
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLength: 0
+        StabInfoIndex:   0
+        StabSectNum:     0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-64.yaml b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
new file mode 100644
index 0000000000000..9bbb1107555e0
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
@@ -0,0 +1,115 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1F7
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xF8
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0xA8
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x20
+    Size:            0x18
+    FileOffsetToData: 0xC4
+    FileOffsetToRelocations: 0xDC
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000000000000000000000380000000000000000'
+    Relocations:
+      - Address:         0x20
+        Symbol:          0x5
+        Info:            0x3F
+        Type:            0x0
+      - Address:         0x28
+        Symbol:          0x9
+        Info:            0x3F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (5ca72bc8d2e87445649eab1825dffd2a047440ba)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 25
+        SectionOrLengthHi: 0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 3
+        SectionOrLengthHi: 0
+  - Name:            foo
+    Value:           0x20
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 3
+        StorageMappingClass: XMC_DS
+        SectionOrLengthLo: 24
+        SectionOrLengthHi: 0
+  - Name:            TOC
+    Value:           0x38
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLengthLo: 0
+        SectionOrLengthHi: 0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
new file mode 100644
index 0000000000000..8e5c929e82878
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
@@ -0,0 +1,68 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_PPC64
+  Flags:           [  ]
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    AddressAlign:    0x10
+    Content:         '2000804E000000000000000000000000'
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         0049424D204F70656E20584C20432F432B2B20666F72204C696E7578206F6E20506F7765722031372E312E322028353732352D4337322C20353736352D4A3230292C2076657273696F6E2031372E312E322E302C20636C616E672076657273696F6E2032312E302E306769742028676974406769746875622E69626D2E636F6D3A636F6D70696C65722F6C6C766D2D70726F6A6563742E67697420653165653233663838333532623937333563363735386661396335653035313366626234393361322900
+  - Name:            .note.GNU-stack
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    AddressAlign:    0x8
+    Content:         1000000000000000017A5200047841011B0C01001000000018000000000000001000000000000000
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x1C
+        Symbol:          .text
+        Type:            R_PPC64_REL32
+  - Name:            .llvm_addrsig
+    Type:            SHT_LLVM_ADDRSIG
+    Flags:           [ SHF_EXCLUDE ]
+    Link:            .symtab
+    AddressAlign:    0x1
+    Offset:          0x1B8
+    Symbols:         [  ]
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .text
+      - Name:            .comment
+      - Name:            .note.GNU-stack
+      - Name:            .eh_frame
+      - Name:            .rela.eh_frame
+      - Name:            .llvm_addrsig
+      - Name:            .symtab
+Symbols:
+  - Name:            foo1.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .text
+    Type:            STT_SECTION
+    Section:         .text
+  - Name:            foo1
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Size:            0x10
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml
new file mode 100644
index 00000000...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2025

@llvm/pr-subscribers-debuginfo

Author: Midhunesh (midhuncodes7)

Changes

This PR implements big archive recognition by the symbolizer.
The archive input format should be in archive.a(member.o) format


Patch is 26.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150401.diff

10 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+14-4)
  • (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+26)
  • (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+95-7)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-32.yaml (+119)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-64.yaml (+115)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml (+68)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml (+68)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-elf.test (+25)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-xcoff.test (+27)
  • (modified) llvm/tools/llvm-symbolizer/Opts.td (+3-3)
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b2470a83e..8f3a132139fe9 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
 .. option:: --default-arch <arch>
 
   If a binary contains object files for multiple architectures (e.g. it is a
-  Mach-O universal binary), symbolize the object file for a given architecture.
-  You can also specify the architecture by writing ``binary_name:arch_name`` in
-  the input (see example below). If the architecture is not specified in either
-  way, the address will not be symbolized. Defaults to empty string.
+  Mach-O universal binary or an AIX archive with architecture variants),
+  symbolize the object file for a given architecture. You can also specify
+  the architecture by writing ``binary_name:arch_name`` in the input (see
+  example below). For AIX archives, the format ``archive.a(member.o):arch``
+  is also supported. If the architecture is not specified in either way,
+  the address will not be symbolized. Defaults to empty string.
 
   .. code-block:: console
 
     $ cat addr.txt
     /tmp/mach_universal_binary:i386 0x1f84
     /tmp/mach_universal_binary:x86_64 0x100000f24
+    /tmp/archive.a(member.o):ppc 0x1000
+    /tmp/archive.a(member.o):ppc64 0x2000
 
     $ llvm-symbolizer < addr.txt
     _main
@@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
     _main
     /tmp/source_x86_64.cc:8
 
+    _foo
+    /tmp/source_ppc.cc:12
+    
+    _foo
+    /tmp/source_ppc64.cc:12
+
 .. option:: --dsym-hint <path/to/file.dSYM>
 
   If the debug info for a binary isn't present in the default location, look for
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index fb8f3d8af6b1b..5144085f3e23c 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -29,6 +29,12 @@
 #include <utility>
 #include <vector>
 
+#if defined(_AIX)
+#  define SYMBOLIZE_AIX 1
+#else
+#  define SYMBOLIZE_AIX 0
+#endif
+
 namespace llvm {
 namespace object {
 class ELFObjectFileBase;
@@ -202,6 +208,12 @@ class LLVMSymbolizer {
   Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
                                            const std::string &ArchName);
 
+  /// Return a pointer to object file at specified path, for a specified
+  /// architecture that is present inside an archive file
+  Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
+                                                      StringRef MemberName,
+                                                      const std::string &ArchName);   
+
   /// Update the LRU cache order when a binary is accessed.
   void recordAccess(CachedBinary &Bin);
 
@@ -226,6 +238,20 @@ class LLVMSymbolizer {
   std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
       ObjectForUBPathAndArch;
 
+  struct ArchiveCacheKey {
+    std::string ArchivePath;  // Storage for StringRef
+    std::string MemberName;   // Storage for StringRef
+    std::string ArchName;     // Storage for StringRef
+
+    // Required for map comparison
+    bool operator<(const ArchiveCacheKey &Other) const {
+      return std::tie(ArchivePath, MemberName, ArchName) < 
+             std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
+    }
+  };
+
+  std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>> ObjectForArchivePathAndArch;
+  
   Options Opts;
 
   std::unique_ptr<BuildIDFetcher> BIDFetcher;
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 56527719da51f..6dddc3a709239 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Object/Archive.h"
 #include <cassert>
 #include <cstring>
 
@@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
 
 void LLVMSymbolizer::flush() {
   ObjectForUBPathAndArch.clear();
+  ObjectForArchivePathAndArch.clear();
   LRUBinaries.clear();
   CacheSize = 0;
   BinaryForPath.clear();
@@ -321,7 +323,7 @@ bool checkFileCRC(StringRef Path, uint32_t CRCHash) {
 
 bool getGNUDebuglinkContents(const ObjectFile *Obj, std::string &DebugName,
                              uint32_t &CRCHash) {
-  if (!Obj)
+  if (!Obj || !isa<ObjectFile>(Obj))
     return false;
   for (const SectionRef &Section : Obj->sections()) {
     StringRef Name;
@@ -557,19 +559,101 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
   if (!DbgObj)
     DbgObj = Obj;
   ObjectPair Res = std::make_pair(Obj, DbgObj);
-  std::string DbgObjPath = DbgObj->getFileName().str();
   auto Pair =
       ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
-  BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
+  std::string DbgObjPath = DbgObj->getFileName().str();
+  auto BinIter = BinaryForPath.find(DbgObjPath);
+  if (BinIter != BinaryForPath.end()) {
+    BinIter->second.pushEvictor([this, I = Pair.first]() {
     ObjectPairForPathArch.erase(I);
-  });
+    });
+  }
   return Res;
 }
 
+Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(StringRef ArchivePath, 
+                                                                    StringRef MemberName, 
+                                                                    const std::string &ArchName) {
+  Binary *Bin = nullptr;
+  auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>());
+  if (!Pair.second) {
+    Bin = Pair.first->second->getBinary();
+    recordAccess(Pair.first->second);
+  } else {
+    Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath);
+    if (!ArchiveOrErr) {
+      return ArchiveOrErr.takeError();
+    }
+
+    CachedBinary &CachedBin = Pair.first->second;
+    CachedBin = std::move(ArchiveOrErr.get());
+    CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
+    LRUBinaries.push_back(CachedBin);
+    CacheSize += CachedBin.size();
+    Bin = CachedBin->getBinary();
+  }
+
+  if (!Bin || !isa<object::Archive>(Bin))
+    return errorCodeToError(object_error::invalid_file_type);
+
+  object::Archive *Archive = cast<object::Archive>(Bin);
+  Error Err = Error::success();
+  
+  // On AIX, archives can contain multiple members with same name but different types
+  // We need to check all matches and find one that matches both name and architecture
+  for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
+    Expected<StringRef> NameOrErr = Child.getName();
+    if (!NameOrErr)
+      continue; 
+    if (*NameOrErr == llvm::sys::path::filename(MemberName)) {
+      Expected<std::unique_ptr<object::Binary>> MemberOrErr = Child.getAsBinary();
+      if (!MemberOrErr)
+        continue; 
+      
+      std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
+      if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
+#if defined(_AIX)
+        Triple::ArchType ObjArch = Obj->makeTriple().getArch();
+        Triple RequestedTriple;
+        RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
+        if (ObjArch != RequestedTriple.getArch())
+          continue;
+#endif
+        ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(), ArchName};
+        auto I = ObjectForArchivePathAndArch.find(CacheKey);
+        if (I != ObjectForArchivePathAndArch.end())
+          return I->second.get();
+
+        auto CachedObj = std::unique_ptr<ObjectFile>(Obj);
+        auto NewEntry = ObjectForArchivePathAndArch.emplace(
+            CacheKey, std::move(CachedObj));
+        Binary.release();
+        BinaryForPath.find(ArchivePath.str())->second.pushEvictor(
+            [this, Iter = NewEntry.first]() { ObjectForArchivePathAndArch.erase(Iter); });
+        return NewEntry.first->second.get();
+      }
+    }
+  }
+  if (Err)
+    return std::move(Err);
+  return errorCodeToError(object_error::arch_not_found);
+}
+
 Expected<ObjectFile *>
 LLVMSymbolizer::getOrCreateObject(const std::string &Path,
                                   const std::string &ArchName) {
-  Binary *Bin;
+  // First check for archive(member) format - more efficient to check closing paren first
+  size_t CloseParen = Path.rfind(')');
+  if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
+    size_t OpenParen = Path.rfind('(', CloseParen);
+    if (OpenParen != std::string::npos) {
+      StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
+      StringRef MemberName = StringRef(Path).substr(OpenParen + 1, CloseParen - OpenParen - 1);
+      return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
+    }
+  }
+
+  Binary *Bin = nullptr;
   auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
   if (!Pair.second) {
     Bin = Pair.first->second->getBinary();
@@ -648,7 +732,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
 
   auto I = Modules.find(ModuleName);
   if (I != Modules.end()) {
-    recordAccess(BinaryForPath.find(BinaryName)->second);
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end())
+      recordAccess(BinIter->second);
     return I->second.get();
   }
 
@@ -716,7 +802,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
       createModuleInfo(Objects.first, std::move(Context), ModuleName);
   if (ModuleOrErr) {
     auto I = Modules.find(ModuleName);
-    BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end()) 
+      BinIter->second.pushEvictor([this, I]() {
       Modules.erase(I);
     });
   }
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-32.yaml b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
new file mode 100644
index 0000000000000..2080607a1a88c
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
@@ -0,0 +1,119 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1DF
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xA0
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0x64
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x1C
+    Size:            0xC
+    FileOffsetToData: 0x80
+    FileOffsetToRelocations: 0x8C
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000002800000000'
+    Relocations:
+      - Address:         0x1C
+        Symbol:          0x5
+        Info:            0x1F
+        Type:            0x0
+      - Address:         0x20
+        Symbol:          0x9
+        Info:            0x1F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 25
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 3
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            foo
+    Value:           0x1C
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_DS
+        SectionOrLength: 12
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            TOC
+    Value:           0x28
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLength: 0
+        StabInfoIndex:   0
+        StabSectNum:     0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-64.yaml b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
new file mode 100644
index 0000000000000..9bbb1107555e0
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
@@ -0,0 +1,115 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1F7
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xF8
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0xA8
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x20
+    Size:            0x18
+    FileOffsetToData: 0xC4
+    FileOffsetToRelocations: 0xDC
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000000000000000000000380000000000000000'
+    Relocations:
+      - Address:         0x20
+        Symbol:          0x5
+        Info:            0x3F
+        Type:            0x0
+      - Address:         0x28
+        Symbol:          0x9
+        Info:            0x3F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (5ca72bc8d2e87445649eab1825dffd2a047440ba)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 25
+        SectionOrLengthHi: 0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 3
+        SectionOrLengthHi: 0
+  - Name:            foo
+    Value:           0x20
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 3
+        StorageMappingClass: XMC_DS
+        SectionOrLengthLo: 24
+        SectionOrLengthHi: 0
+  - Name:            TOC
+    Value:           0x38
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLengthLo: 0
+        SectionOrLengthHi: 0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
new file mode 100644
index 0000000000000..8e5c929e82878
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
@@ -0,0 +1,68 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_PPC64
+  Flags:           [  ]
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    AddressAlign:    0x10
+    Content:         '2000804E000000000000000000000000'
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         0049424D204F70656E20584C20432F432B2B20666F72204C696E7578206F6E20506F7765722031372E312E322028353732352D4337322C20353736352D4A3230292C2076657273696F6E2031372E312E322E302C20636C616E672076657273696F6E2032312E302E306769742028676974406769746875622E69626D2E636F6D3A636F6D70696C65722F6C6C766D2D70726F6A6563742E67697420653165653233663838333532623937333563363735386661396335653035313366626234393361322900
+  - Name:            .note.GNU-stack
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    AddressAlign:    0x8
+    Content:         1000000000000000017A5200047841011B0C01001000000018000000000000001000000000000000
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x1C
+        Symbol:          .text
+        Type:            R_PPC64_REL32
+  - Name:            .llvm_addrsig
+    Type:            SHT_LLVM_ADDRSIG
+    Flags:           [ SHF_EXCLUDE ]
+    Link:            .symtab
+    AddressAlign:    0x1
+    Offset:          0x1B8
+    Symbols:         [  ]
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .text
+      - Name:            .comment
+      - Name:            .note.GNU-stack
+      - Name:            .eh_frame
+      - Name:            .rela.eh_frame
+      - Name:            .llvm_addrsig
+      - Name:            .symtab
+Symbols:
+  - Name:            foo1.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .text
+    Type:            STT_SECTION
+    Section:         .text
+  - Name:            foo1
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Size:            0x10
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml
new file mode 100644
index 00000000...
[truncated]

@github-actions
Copy link

github-actions bot commented Jul 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

LLVMSymbolizer::getOrCreateObject(const std::string &Path,
const std::string &ArchName) {
Binary *Bin;
// First check for archive(member) format - more efficient to check closing paren first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// First check for archive(member) format - more efficient to check closing paren first
// First check for archive(member) format - more efficient to check closing paren first.

Nit.

This behaviour change isn't guarded to Big Archive format as stated in the documentation. It looks to me like a UNIX archive would be impacted by this too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping. The second part of this comment hasn't been addressed as far as I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we intend to allow naming archive members in non-AIX contexts (as that may be helpful). The documentation will be made more generic like how it used to be before.

@jh7370
Copy link
Collaborator

jh7370 commented Aug 7, 2025

I'm still catching up on reviews left over from my time off last week. I'll be taking a look again at this hopefully tomorrow.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mark conversation threads that I have initiated as resolved. I need the "resolved" status to help me spot things I have and have not checked, because it's not unheard of for people to mark things as resolved without actually addressing them. You can find more context on https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178.

@midhuncodes7
Copy link
Contributor Author

@jh7370 any updates from your end?

@jh7370
Copy link
Collaborator

jh7370 commented Oct 16, 2025

@jh7370 any updates from your end?

Sorry, I've had some time off and have been unwell too. I'm starting to catch up on things now, but it'll likely be next week before I can take a look.

ArchivePath.str().c_str());

Error Err = Error::success();
// On AIX, archives can contain multiple members with the same name but
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: it's quite straightforward to create an archive with multiple members of the same name, using llvm-ar q. I don't think this impacts things here specifically though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the comment as it is not specific

std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
ObjectForUBPathAndArch;
struct ArchiveCacheKey {
std::string ArchivePath;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: references to Archive in the names here are misleading, because they might be referring to a Mach-O universal binary, if I'm not mistaken.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed as container to be more generic

});
std::string DbgObjPath = DbgObj->getFileName().str();
auto BinIter = BinaryForPath.find(DbgObjPath);
if (BinIter != BinaryForPath.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've given it some more thought and I'm fairly convinced the problem is that you're storing the wrong thing in BinaryForPath when dealing with archive members. Specifically, this should store the final binary (i.e. the archive child binary) and should be keyed on t.a(t.o) (or equivalent).

This will require some reorganisation of the code from what you've done, so that loadOrGetBinary knows that it's expecting an archive and fetches the child out of that.

Does this make sense?

@midhuncodes7 midhuncodes7 requested a review from jh7370 October 28, 2025 20:36
@midhuncodes7
Copy link
Contributor Author

@jh7370 requesting your inputs on the recent changes

@midhuncodes7
Copy link
Contributor Author

@jh7370 any updates on this review will be much appreciated

std::map<std::string, CachedBinary, std::less<>> BinaryForPath;

/// Store the archive path for the object file
std::unordered_map<const object::ObjectFile *, std::string>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the section on map types in the LLVM programmer's manual.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used DenseMap instead

};

/// Parsed object file for path/object/architecture pair, where
/// "path" refers to Mach-O universal binary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this comment up-to-date?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated the comment

/// Helper function to load binary.
Expected<object::Binary *>
loadOrGetBinary(const std::string &OpenPath,
const std::string *FullPathKeyOpt = nullptr);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use std::optional for optional types, not pointers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Used std::optional

/// "path" refers to Mach-O universal binary.
std::map<ContainerCacheKey, std::unique_ptr<ObjectFile>> ObjectFileCache;

/// Helper function to load binary.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly how is this comment aiding readability? What does it add that the function name itself doesn't add?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed comment

loadOrGetBinary(const std::string &OpenPath,
const std::string *FullPathKeyOpt = nullptr);

/// Helper function to find and get object.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above re. comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed comment

/// Contains parsed binary for each path, or parsing error.
std::map<std::string, CachedBinary, std::less<>> BinaryForPath;

/// Store the archive path for the object file
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Store the archive path for the object file
/// Store the archive path for the object file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

corrected

if (It != ObjectToArchivePath.end()) {
StringRef ArchivePath = It->second;
StringRef MemberName = sys::path::filename(DbgObj->getFileName());
FullDbgObjKey = (Twine(ArchivePath) + "(" + MemberName + ")").str();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to wrap ArchivePath in a Twine constructor explicitly. The result of adding a StringRef and a string literal is a Twine anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed Twine

Expected<object::Binary *>
LLVMSymbolizer::loadOrGetBinary(const std::string &OpenPath,
const std::string *FullPathKeyOpt) {
// If no separate cache key is provided, use the open path itself.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is an "open path"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed

CacheSize += CachedBin.size();
Bin = CachedBin->getBinary();
Expected<OwningBinary<Binary>> BinOrErr = createBinary(OpenPath);
if (!BinOrErr) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Corrected

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're still using the term "open path" in your variable names. Please don't make up terminology.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed and replaced as ArchivePathKey

@github-actions
Copy link

github-actions bot commented Nov 18, 2025

🐧 Linux x64 Test Results

  • 186438 tests passed
  • 4864 tests skipped

@midhuncodes7 midhuncodes7 requested a review from jh7370 November 19, 2025 07:30
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code is mostly looking okay now. I'll look at the tests in my next pass.

@midhuncodes7 midhuncodes7 force-pushed the midhun7/big-archive-recognition branch from c74382e to d16b8d4 Compare November 21, 2025 10:17
@midhuncodes7 midhuncodes7 requested a review from jh7370 November 24, 2025 07:50
@@ -247,8 +246,8 @@ class LLVMSymbolizer {
};

/// Parsed object file for each path/member/architecture triple.
/// Used to cache objects extracted from containers (Mach-O universal
/// binaries, archives).
/// Used to cache objects extracted from containers (e.g: Mach-O
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: "e.g." or "e.g.," (depending on whether you follow American or British English style respectively, I believe American is more common in LLVM). ":" is not appropriate here.

@@ -0,0 +1,119 @@
--- !XCOFF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming the file "big-archive" doesn't make sense, because the output file is an XCOFF file, not an archive file. Just call it xcoff-32.yaml or similar.

Do we really need so much bloat in the file? The idea of YAML inputs is generally to have small, focused inputs where we can easily see what's relevant to the test. From what I can tell, all you need to have is the minimal file header info, one symbol, and a section for that symbol to be in (and even the section is potentially unnecessary).

If we can minimise the size, it would then make more sense to put the files inline in the test, rather than separate, to keep everything related to a test together.

@@ -0,0 +1,68 @@
--- !ELF
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments apply to the ELF file as the XCOFF file. Further, we should be able to drop the second file entirely and replace it with parameterisation using yaml2obj's -D option to choose the class/machine values as needed. You can see plenty of examples of minimal ELF YAML inputs in other tests.

@@ -0,0 +1,25 @@
// Test archive member recognition by name (ELF format).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to put newer llvm-symbolizer tests that have some relevance to the llvm-symbolizer command-line handling in the test/tools/llvm-symbolizer directory, rather than the DebugInfo directory.

We also tend to use # for comments and no markers for RUN commands/FileCheck prefixes when we don't need to worry about code. If you do move the YAML inline, you should instead use ## for comments and # for lit and FIleCheck stuff.

// RUN: rm -f %t.a
// RUN: llvm-ar crv %t.a %t-1.o %t-2.o

// Test symbolization by member name (using just base names).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"using just base names" is an unnecessary addition to the comment and suggests there's another method that can be used (which we've ruled out previously).

// CHECK-ERROR: error: {{.*}}no matching member{{.*}}
// RUN: not llvm-symbolizer --obj="%t-1.o(%{t:stem}.tmp-1.o)" 0x1000 2>&1 | FileCheck %s --check-prefix=CHECK-NOTARCHIVE
// CHECK-NOTARCHIVE: error: '{{.*}}' is not a valid archive
// RUN: not llvm-symbolizer --obj="%t.a()" 0x1000 2>&1 | FileCheck %s --check-prefix=CHECK-EMPTY
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use CHECK-EMPTY as a prefix, because -EMPTY is already a special FileCheck suffix and this will just cause confusion.

@@ -0,0 +1,33 @@
// Test big archive recognition and error handling in llvm-symbolizer.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comments as the other test apply to this one.

Do we actually need distinct ELF and XCOFF test cases? How is the behaviour different?

// RUN: cp %t-64.o %t.o
// RUN: llvm-ar %if system-aix %{-X64 rv%} %else %{qv%} %t.a %t.o

// Verify archive contains two members with same name.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you checking this?

Comment on lines +24 to +25
// CHECK-32: foo
// CHECK-64: foo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the check patterns being identical, how do you know that the behaviour is actually correct? In other words, how do you know that the right object is being found for symbolization?

// Test error cases.
// RUN: not llvm-symbolizer --obj="%t.a(nonexistent.o)" 0x1000 2>&1 | FileCheck %s --check-prefix=CHECK-ERROR
// CHECK-ERROR: error: {{.*}}no matching member{{.*}}
// RUN: not llvm-symbolizer --obj="%t.o(%{t:stem}.tmp.o)" 0x1000 2>&1 | FileCheck %s --check-prefix=CHECK-NOTARCHIVE
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could simplify some of these patterns by creating a directory and cd-ing into it at the start of the test before creating the files. That way you can just use names like "1.o" and "2.o" and "archive.a" or whatever throughout the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants