Skip to content

Conversation

midhuncodes7
Copy link
Contributor

This PR implements big archive recognition by the symbolizer.
The archive input format should be in archive.a(member.o) format

@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2025

@llvm/pr-subscribers-llvm-binary-utilities

Author: Midhunesh (midhuncodes7)

Changes

This PR implements big archive recognition by the symbolizer.
The archive input format should be in archive.a(member.o) format


Patch is 26.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150401.diff

10 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+14-4)
  • (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+26)
  • (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+95-7)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-32.yaml (+119)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-64.yaml (+115)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml (+68)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml (+68)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-elf.test (+25)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-xcoff.test (+27)
  • (modified) llvm/tools/llvm-symbolizer/Opts.td (+3-3)
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b2470a83e..8f3a132139fe9 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
 .. option:: --default-arch <arch>
 
   If a binary contains object files for multiple architectures (e.g. it is a
-  Mach-O universal binary), symbolize the object file for a given architecture.
-  You can also specify the architecture by writing ``binary_name:arch_name`` in
-  the input (see example below). If the architecture is not specified in either
-  way, the address will not be symbolized. Defaults to empty string.
+  Mach-O universal binary or an AIX archive with architecture variants),
+  symbolize the object file for a given architecture. You can also specify
+  the architecture by writing ``binary_name:arch_name`` in the input (see
+  example below). For AIX archives, the format ``archive.a(member.o):arch``
+  is also supported. If the architecture is not specified in either way,
+  the address will not be symbolized. Defaults to empty string.
 
   .. code-block:: console
 
     $ cat addr.txt
     /tmp/mach_universal_binary:i386 0x1f84
     /tmp/mach_universal_binary:x86_64 0x100000f24
+    /tmp/archive.a(member.o):ppc 0x1000
+    /tmp/archive.a(member.o):ppc64 0x2000
 
     $ llvm-symbolizer < addr.txt
     _main
@@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
     _main
     /tmp/source_x86_64.cc:8
 
+    _foo
+    /tmp/source_ppc.cc:12
+    
+    _foo
+    /tmp/source_ppc64.cc:12
+
 .. option:: --dsym-hint <path/to/file.dSYM>
 
   If the debug info for a binary isn't present in the default location, look for
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index fb8f3d8af6b1b..5144085f3e23c 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -29,6 +29,12 @@
 #include <utility>
 #include <vector>
 
+#if defined(_AIX)
+#  define SYMBOLIZE_AIX 1
+#else
+#  define SYMBOLIZE_AIX 0
+#endif
+
 namespace llvm {
 namespace object {
 class ELFObjectFileBase;
@@ -202,6 +208,12 @@ class LLVMSymbolizer {
   Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
                                            const std::string &ArchName);
 
+  /// Return a pointer to object file at specified path, for a specified
+  /// architecture that is present inside an archive file
+  Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
+                                                      StringRef MemberName,
+                                                      const std::string &ArchName);   
+
   /// Update the LRU cache order when a binary is accessed.
   void recordAccess(CachedBinary &Bin);
 
@@ -226,6 +238,20 @@ class LLVMSymbolizer {
   std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
       ObjectForUBPathAndArch;
 
+  struct ArchiveCacheKey {
+    std::string ArchivePath;  // Storage for StringRef
+    std::string MemberName;   // Storage for StringRef
+    std::string ArchName;     // Storage for StringRef
+
+    // Required for map comparison
+    bool operator<(const ArchiveCacheKey &Other) const {
+      return std::tie(ArchivePath, MemberName, ArchName) < 
+             std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
+    }
+  };
+
+  std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>> ObjectForArchivePathAndArch;
+  
   Options Opts;
 
   std::unique_ptr<BuildIDFetcher> BIDFetcher;
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 56527719da51f..6dddc3a709239 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Object/Archive.h"
 #include <cassert>
 #include <cstring>
 
@@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
 
 void LLVMSymbolizer::flush() {
   ObjectForUBPathAndArch.clear();
+  ObjectForArchivePathAndArch.clear();
   LRUBinaries.clear();
   CacheSize = 0;
   BinaryForPath.clear();
@@ -321,7 +323,7 @@ bool checkFileCRC(StringRef Path, uint32_t CRCHash) {
 
 bool getGNUDebuglinkContents(const ObjectFile *Obj, std::string &DebugName,
                              uint32_t &CRCHash) {
-  if (!Obj)
+  if (!Obj || !isa<ObjectFile>(Obj))
     return false;
   for (const SectionRef &Section : Obj->sections()) {
     StringRef Name;
@@ -557,19 +559,101 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
   if (!DbgObj)
     DbgObj = Obj;
   ObjectPair Res = std::make_pair(Obj, DbgObj);
-  std::string DbgObjPath = DbgObj->getFileName().str();
   auto Pair =
       ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
-  BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
+  std::string DbgObjPath = DbgObj->getFileName().str();
+  auto BinIter = BinaryForPath.find(DbgObjPath);
+  if (BinIter != BinaryForPath.end()) {
+    BinIter->second.pushEvictor([this, I = Pair.first]() {
     ObjectPairForPathArch.erase(I);
-  });
+    });
+  }
   return Res;
 }
 
+Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(StringRef ArchivePath, 
+                                                                    StringRef MemberName, 
+                                                                    const std::string &ArchName) {
+  Binary *Bin = nullptr;
+  auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>());
+  if (!Pair.second) {
+    Bin = Pair.first->second->getBinary();
+    recordAccess(Pair.first->second);
+  } else {
+    Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath);
+    if (!ArchiveOrErr) {
+      return ArchiveOrErr.takeError();
+    }
+
+    CachedBinary &CachedBin = Pair.first->second;
+    CachedBin = std::move(ArchiveOrErr.get());
+    CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
+    LRUBinaries.push_back(CachedBin);
+    CacheSize += CachedBin.size();
+    Bin = CachedBin->getBinary();
+  }
+
+  if (!Bin || !isa<object::Archive>(Bin))
+    return errorCodeToError(object_error::invalid_file_type);
+
+  object::Archive *Archive = cast<object::Archive>(Bin);
+  Error Err = Error::success();
+  
+  // On AIX, archives can contain multiple members with same name but different types
+  // We need to check all matches and find one that matches both name and architecture
+  for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
+    Expected<StringRef> NameOrErr = Child.getName();
+    if (!NameOrErr)
+      continue; 
+    if (*NameOrErr == llvm::sys::path::filename(MemberName)) {
+      Expected<std::unique_ptr<object::Binary>> MemberOrErr = Child.getAsBinary();
+      if (!MemberOrErr)
+        continue; 
+      
+      std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
+      if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
+#if defined(_AIX)
+        Triple::ArchType ObjArch = Obj->makeTriple().getArch();
+        Triple RequestedTriple;
+        RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
+        if (ObjArch != RequestedTriple.getArch())
+          continue;
+#endif
+        ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(), ArchName};
+        auto I = ObjectForArchivePathAndArch.find(CacheKey);
+        if (I != ObjectForArchivePathAndArch.end())
+          return I->second.get();
+
+        auto CachedObj = std::unique_ptr<ObjectFile>(Obj);
+        auto NewEntry = ObjectForArchivePathAndArch.emplace(
+            CacheKey, std::move(CachedObj));
+        Binary.release();
+        BinaryForPath.find(ArchivePath.str())->second.pushEvictor(
+            [this, Iter = NewEntry.first]() { ObjectForArchivePathAndArch.erase(Iter); });
+        return NewEntry.first->second.get();
+      }
+    }
+  }
+  if (Err)
+    return std::move(Err);
+  return errorCodeToError(object_error::arch_not_found);
+}
+
 Expected<ObjectFile *>
 LLVMSymbolizer::getOrCreateObject(const std::string &Path,
                                   const std::string &ArchName) {
-  Binary *Bin;
+  // First check for archive(member) format - more efficient to check closing paren first
+  size_t CloseParen = Path.rfind(')');
+  if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
+    size_t OpenParen = Path.rfind('(', CloseParen);
+    if (OpenParen != std::string::npos) {
+      StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
+      StringRef MemberName = StringRef(Path).substr(OpenParen + 1, CloseParen - OpenParen - 1);
+      return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
+    }
+  }
+
+  Binary *Bin = nullptr;
   auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
   if (!Pair.second) {
     Bin = Pair.first->second->getBinary();
@@ -648,7 +732,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
 
   auto I = Modules.find(ModuleName);
   if (I != Modules.end()) {
-    recordAccess(BinaryForPath.find(BinaryName)->second);
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end())
+      recordAccess(BinIter->second);
     return I->second.get();
   }
 
@@ -716,7 +802,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
       createModuleInfo(Objects.first, std::move(Context), ModuleName);
   if (ModuleOrErr) {
     auto I = Modules.find(ModuleName);
-    BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end()) 
+      BinIter->second.pushEvictor([this, I]() {
       Modules.erase(I);
     });
   }
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-32.yaml b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
new file mode 100644
index 0000000000000..2080607a1a88c
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
@@ -0,0 +1,119 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1DF
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xA0
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0x64
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x1C
+    Size:            0xC
+    FileOffsetToData: 0x80
+    FileOffsetToRelocations: 0x8C
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000002800000000'
+    Relocations:
+      - Address:         0x1C
+        Symbol:          0x5
+        Info:            0x1F
+        Type:            0x0
+      - Address:         0x20
+        Symbol:          0x9
+        Info:            0x1F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 25
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 3
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            foo
+    Value:           0x1C
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_DS
+        SectionOrLength: 12
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            TOC
+    Value:           0x28
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLength: 0
+        StabInfoIndex:   0
+        StabSectNum:     0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-64.yaml b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
new file mode 100644
index 0000000000000..9bbb1107555e0
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
@@ -0,0 +1,115 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1F7
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xF8
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0xA8
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x20
+    Size:            0x18
+    FileOffsetToData: 0xC4
+    FileOffsetToRelocations: 0xDC
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000000000000000000000380000000000000000'
+    Relocations:
+      - Address:         0x20
+        Symbol:          0x5
+        Info:            0x3F
+        Type:            0x0
+      - Address:         0x28
+        Symbol:          0x9
+        Info:            0x3F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (5ca72bc8d2e87445649eab1825dffd2a047440ba)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 25
+        SectionOrLengthHi: 0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 3
+        SectionOrLengthHi: 0
+  - Name:            foo
+    Value:           0x20
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 3
+        StorageMappingClass: XMC_DS
+        SectionOrLengthLo: 24
+        SectionOrLengthHi: 0
+  - Name:            TOC
+    Value:           0x38
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLengthLo: 0
+        SectionOrLengthHi: 0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
new file mode 100644
index 0000000000000..8e5c929e82878
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
@@ -0,0 +1,68 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_PPC64
+  Flags:           [  ]
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    AddressAlign:    0x10
+    Content:         '2000804E000000000000000000000000'
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         0049424D204F70656E20584C20432F432B2B20666F72204C696E7578206F6E20506F7765722031372E312E322028353732352D4337322C20353736352D4A3230292C2076657273696F6E2031372E312E322E302C20636C616E672076657273696F6E2032312E302E306769742028676974406769746875622E69626D2E636F6D3A636F6D70696C65722F6C6C766D2D70726F6A6563742E67697420653165653233663838333532623937333563363735386661396335653035313366626234393361322900
+  - Name:            .note.GNU-stack
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    AddressAlign:    0x8
+    Content:         1000000000000000017A5200047841011B0C01001000000018000000000000001000000000000000
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x1C
+        Symbol:          .text
+        Type:            R_PPC64_REL32
+  - Name:            .llvm_addrsig
+    Type:            SHT_LLVM_ADDRSIG
+    Flags:           [ SHF_EXCLUDE ]
+    Link:            .symtab
+    AddressAlign:    0x1
+    Offset:          0x1B8
+    Symbols:         [  ]
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .text
+      - Name:            .comment
+      - Name:            .note.GNU-stack
+      - Name:            .eh_frame
+      - Name:            .rela.eh_frame
+      - Name:            .llvm_addrsig
+      - Name:            .symtab
+Symbols:
+  - Name:            foo1.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .text
+    Type:            STT_SECTION
+    Section:         .text
+  - Name:            foo1
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Size:            0x10
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml
new file mode 100644
index 00000000...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Jul 24, 2025

@llvm/pr-subscribers-debuginfo

Author: Midhunesh (midhuncodes7)

Changes

This PR implements big archive recognition by the symbolizer.
The archive input format should be in archive.a(member.o) format


Patch is 26.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150401.diff

10 Files Affected:

  • (modified) llvm/docs/CommandGuide/llvm-symbolizer.rst (+14-4)
  • (modified) llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h (+26)
  • (modified) llvm/lib/DebugInfo/Symbolize/Symbolize.cpp (+95-7)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-32.yaml (+119)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-64.yaml (+115)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml (+68)
  • (added) llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml (+68)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-elf.test (+25)
  • (added) llvm/test/DebugInfo/symbolize-big-archive-xcoff.test (+27)
  • (modified) llvm/tools/llvm-symbolizer/Opts.td (+3-3)
diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b2470a83e..8f3a132139fe9 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
 .. option:: --default-arch <arch>
 
   If a binary contains object files for multiple architectures (e.g. it is a
-  Mach-O universal binary), symbolize the object file for a given architecture.
-  You can also specify the architecture by writing ``binary_name:arch_name`` in
-  the input (see example below). If the architecture is not specified in either
-  way, the address will not be symbolized. Defaults to empty string.
+  Mach-O universal binary or an AIX archive with architecture variants),
+  symbolize the object file for a given architecture. You can also specify
+  the architecture by writing ``binary_name:arch_name`` in the input (see
+  example below). For AIX archives, the format ``archive.a(member.o):arch``
+  is also supported. If the architecture is not specified in either way,
+  the address will not be symbolized. Defaults to empty string.
 
   .. code-block:: console
 
     $ cat addr.txt
     /tmp/mach_universal_binary:i386 0x1f84
     /tmp/mach_universal_binary:x86_64 0x100000f24
+    /tmp/archive.a(member.o):ppc 0x1000
+    /tmp/archive.a(member.o):ppc64 0x2000
 
     $ llvm-symbolizer < addr.txt
     _main
@@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
     _main
     /tmp/source_x86_64.cc:8
 
+    _foo
+    /tmp/source_ppc.cc:12
+    
+    _foo
+    /tmp/source_ppc64.cc:12
+
 .. option:: --dsym-hint <path/to/file.dSYM>
 
   If the debug info for a binary isn't present in the default location, look for
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index fb8f3d8af6b1b..5144085f3e23c 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -29,6 +29,12 @@
 #include <utility>
 #include <vector>
 
+#if defined(_AIX)
+#  define SYMBOLIZE_AIX 1
+#else
+#  define SYMBOLIZE_AIX 0
+#endif
+
 namespace llvm {
 namespace object {
 class ELFObjectFileBase;
@@ -202,6 +208,12 @@ class LLVMSymbolizer {
   Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
                                            const std::string &ArchName);
 
+  /// Return a pointer to object file at specified path, for a specified
+  /// architecture that is present inside an archive file
+  Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
+                                                      StringRef MemberName,
+                                                      const std::string &ArchName);   
+
   /// Update the LRU cache order when a binary is accessed.
   void recordAccess(CachedBinary &Bin);
 
@@ -226,6 +238,20 @@ class LLVMSymbolizer {
   std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
       ObjectForUBPathAndArch;
 
+  struct ArchiveCacheKey {
+    std::string ArchivePath;  // Storage for StringRef
+    std::string MemberName;   // Storage for StringRef
+    std::string ArchName;     // Storage for StringRef
+
+    // Required for map comparison
+    bool operator<(const ArchiveCacheKey &Other) const {
+      return std::tie(ArchivePath, MemberName, ArchName) < 
+             std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
+    }
+  };
+
+  std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>> ObjectForArchivePathAndArch;
+  
   Options Opts;
 
   std::unique_ptr<BuildIDFetcher> BIDFetcher;
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 56527719da51f..6dddc3a709239 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Object/Archive.h"
 #include <cassert>
 #include <cstring>
 
@@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
 
 void LLVMSymbolizer::flush() {
   ObjectForUBPathAndArch.clear();
+  ObjectForArchivePathAndArch.clear();
   LRUBinaries.clear();
   CacheSize = 0;
   BinaryForPath.clear();
@@ -321,7 +323,7 @@ bool checkFileCRC(StringRef Path, uint32_t CRCHash) {
 
 bool getGNUDebuglinkContents(const ObjectFile *Obj, std::string &DebugName,
                              uint32_t &CRCHash) {
-  if (!Obj)
+  if (!Obj || !isa<ObjectFile>(Obj))
     return false;
   for (const SectionRef &Section : Obj->sections()) {
     StringRef Name;
@@ -557,19 +559,101 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
   if (!DbgObj)
     DbgObj = Obj;
   ObjectPair Res = std::make_pair(Obj, DbgObj);
-  std::string DbgObjPath = DbgObj->getFileName().str();
   auto Pair =
       ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
-  BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
+  std::string DbgObjPath = DbgObj->getFileName().str();
+  auto BinIter = BinaryForPath.find(DbgObjPath);
+  if (BinIter != BinaryForPath.end()) {
+    BinIter->second.pushEvictor([this, I = Pair.first]() {
     ObjectPairForPathArch.erase(I);
-  });
+    });
+  }
   return Res;
 }
 
+Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(StringRef ArchivePath, 
+                                                                    StringRef MemberName, 
+                                                                    const std::string &ArchName) {
+  Binary *Bin = nullptr;
+  auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>());
+  if (!Pair.second) {
+    Bin = Pair.first->second->getBinary();
+    recordAccess(Pair.first->second);
+  } else {
+    Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath);
+    if (!ArchiveOrErr) {
+      return ArchiveOrErr.takeError();
+    }
+
+    CachedBinary &CachedBin = Pair.first->second;
+    CachedBin = std::move(ArchiveOrErr.get());
+    CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
+    LRUBinaries.push_back(CachedBin);
+    CacheSize += CachedBin.size();
+    Bin = CachedBin->getBinary();
+  }
+
+  if (!Bin || !isa<object::Archive>(Bin))
+    return errorCodeToError(object_error::invalid_file_type);
+
+  object::Archive *Archive = cast<object::Archive>(Bin);
+  Error Err = Error::success();
+  
+  // On AIX, archives can contain multiple members with same name but different types
+  // We need to check all matches and find one that matches both name and architecture
+  for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
+    Expected<StringRef> NameOrErr = Child.getName();
+    if (!NameOrErr)
+      continue; 
+    if (*NameOrErr == llvm::sys::path::filename(MemberName)) {
+      Expected<std::unique_ptr<object::Binary>> MemberOrErr = Child.getAsBinary();
+      if (!MemberOrErr)
+        continue; 
+      
+      std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
+      if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
+#if defined(_AIX)
+        Triple::ArchType ObjArch = Obj->makeTriple().getArch();
+        Triple RequestedTriple;
+        RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
+        if (ObjArch != RequestedTriple.getArch())
+          continue;
+#endif
+        ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(), ArchName};
+        auto I = ObjectForArchivePathAndArch.find(CacheKey);
+        if (I != ObjectForArchivePathAndArch.end())
+          return I->second.get();
+
+        auto CachedObj = std::unique_ptr<ObjectFile>(Obj);
+        auto NewEntry = ObjectForArchivePathAndArch.emplace(
+            CacheKey, std::move(CachedObj));
+        Binary.release();
+        BinaryForPath.find(ArchivePath.str())->second.pushEvictor(
+            [this, Iter = NewEntry.first]() { ObjectForArchivePathAndArch.erase(Iter); });
+        return NewEntry.first->second.get();
+      }
+    }
+  }
+  if (Err)
+    return std::move(Err);
+  return errorCodeToError(object_error::arch_not_found);
+}
+
 Expected<ObjectFile *>
 LLVMSymbolizer::getOrCreateObject(const std::string &Path,
                                   const std::string &ArchName) {
-  Binary *Bin;
+  // First check for archive(member) format - more efficient to check closing paren first
+  size_t CloseParen = Path.rfind(')');
+  if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
+    size_t OpenParen = Path.rfind('(', CloseParen);
+    if (OpenParen != std::string::npos) {
+      StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
+      StringRef MemberName = StringRef(Path).substr(OpenParen + 1, CloseParen - OpenParen - 1);
+      return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
+    }
+  }
+
+  Binary *Bin = nullptr;
   auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
   if (!Pair.second) {
     Bin = Pair.first->second->getBinary();
@@ -648,7 +732,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
 
   auto I = Modules.find(ModuleName);
   if (I != Modules.end()) {
-    recordAccess(BinaryForPath.find(BinaryName)->second);
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end())
+      recordAccess(BinIter->second);
     return I->second.get();
   }
 
@@ -716,7 +802,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
       createModuleInfo(Objects.first, std::move(Context), ModuleName);
   if (ModuleOrErr) {
     auto I = Modules.find(ModuleName);
-    BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end()) 
+      BinIter->second.pushEvictor([this, I]() {
       Modules.erase(I);
     });
   }
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-32.yaml b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
new file mode 100644
index 0000000000000..2080607a1a88c
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
@@ -0,0 +1,119 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1DF
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xA0
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0x64
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x1C
+    Size:            0xC
+    FileOffsetToData: 0x80
+    FileOffsetToRelocations: 0x8C
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000002800000000'
+    Relocations:
+      - Address:         0x1C
+        Symbol:          0x5
+        Info:            0x1F
+        Type:            0x0
+      - Address:         0x20
+        Symbol:          0x9
+        Info:            0x1F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 25
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 3
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            foo
+    Value:           0x1C
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_DS
+        SectionOrLength: 12
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            TOC
+    Value:           0x28
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLength: 0
+        StabInfoIndex:   0
+        StabSectNum:     0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-64.yaml b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
new file mode 100644
index 0000000000000..9bbb1107555e0
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
@@ -0,0 +1,115 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1F7
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xF8
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0xA8
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x20
+    Size:            0x18
+    FileOffsetToData: 0xC4
+    FileOffsetToRelocations: 0xDC
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000000000000000000000380000000000000000'
+    Relocations:
+      - Address:         0x20
+        Symbol:          0x5
+        Info:            0x3F
+        Type:            0x0
+      - Address:         0x28
+        Symbol:          0x9
+        Info:            0x3F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (5ca72bc8d2e87445649eab1825dffd2a047440ba)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 25
+        SectionOrLengthHi: 0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 3
+        SectionOrLengthHi: 0
+  - Name:            foo
+    Value:           0x20
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 3
+        StorageMappingClass: XMC_DS
+        SectionOrLengthLo: 24
+        SectionOrLengthHi: 0
+  - Name:            TOC
+    Value:           0x38
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLengthLo: 0
+        SectionOrLengthHi: 0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
new file mode 100644
index 0000000000000..8e5c929e82878
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
@@ -0,0 +1,68 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_PPC64
+  Flags:           [  ]
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    AddressAlign:    0x10
+    Content:         '2000804E000000000000000000000000'
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         0049424D204F70656E20584C20432F432B2B20666F72204C696E7578206F6E20506F7765722031372E312E322028353732352D4337322C20353736352D4A3230292C2076657273696F6E2031372E312E322E302C20636C616E672076657273696F6E2032312E302E306769742028676974406769746875622E69626D2E636F6D3A636F6D70696C65722F6C6C766D2D70726F6A6563742E67697420653165653233663838333532623937333563363735386661396335653035313366626234393361322900
+  - Name:            .note.GNU-stack
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    AddressAlign:    0x8
+    Content:         1000000000000000017A5200047841011B0C01001000000018000000000000001000000000000000
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x1C
+        Symbol:          .text
+        Type:            R_PPC64_REL32
+  - Name:            .llvm_addrsig
+    Type:            SHT_LLVM_ADDRSIG
+    Flags:           [ SHF_EXCLUDE ]
+    Link:            .symtab
+    AddressAlign:    0x1
+    Offset:          0x1B8
+    Symbols:         [  ]
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .text
+      - Name:            .comment
+      - Name:            .note.GNU-stack
+      - Name:            .eh_frame
+      - Name:            .rela.eh_frame
+      - Name:            .llvm_addrsig
+      - Name:            .symtab
+Symbols:
+  - Name:            foo1.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .text
+    Type:            STT_SECTION
+    Section:         .text
+  - Name:            foo1
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Size:            0x10
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml
new file mode 100644
index 00000000...
[truncated]

Copy link

github-actions bot commented Jul 24, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

LLVMSymbolizer::getOrCreateObject(const std::string &Path,
const std::string &ArchName) {
Binary *Bin;
// First check for archive(member) format - more efficient to check closing paren first
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// First check for archive(member) format - more efficient to check closing paren first
// First check for archive(member) format - more efficient to check closing paren first.

Nit.

This behaviour change isn't guarded to Big Archive format as stated in the documentation. It looks to me like a UNIX archive would be impacted by this too.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping. The second part of this comment hasn't been addressed as far as I can tell.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we intend to allow naming archive members in non-AIX contexts (as that may be helpful). The documentation will be made more generic like how it used to be before.

@jh7370
Copy link
Collaborator

jh7370 commented Aug 7, 2025

I'm still catching up on reviews left over from my time off last week. I'll be taking a look again at this hopefully tomorrow.

Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't mark conversation threads that I have initiated as resolved. I need the "resolved" status to help me spot things I have and have not checked, because it's not unheard of for people to mark things as resolved without actually addressing them. You can find more context on https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178.

@midhuncodes7 midhuncodes7 requested a review from jh7370 August 29, 2025 08:22
@jh7370
Copy link
Collaborator

jh7370 commented Sep 3, 2025

Have you forgotten to push? I made some comments that you've responded to and others that you haven't, but you've not pushed any new commits since.

@MaskRay
Copy link
Member

MaskRay commented Sep 8, 2025

Consider title like [llvm-symbolizer] Recognize AIX big archive. There are several archive formats, and to the best of my knowledge "big archive" as a term is AIX specific.

@midhuncodes7 midhuncodes7 changed the title big archive recognition by the llvm-symbolizer [llvm-symbolizer] Recognize AIX big archive Sep 8, 2025
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @midhuncodes7,
It's generally expected that ALL reviewer comments are responded to, either via code changes or comments in return, when updating a patch (see https://llvm.org/docs/CodeReview.html#acknowledge-all-reviewer-feedback). In general, there's no need to push updates until you've addressed everything and answered any comments. However, if you do want to push, you should make it clear that you have more changes incoming, either by putting the PR in draft state, or by simply saying as much in the comments. This is important, because otherwise it wastes reviewer time as they review a piece of work that hasn't been finished.

@@ -0,0 +1,23 @@
// Test archive member recognition by name (ELF format)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test comments throughout still need trailing ".".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

});
std::string DbgObjPath = DbgObj->getFileName().str();
auto BinIter = BinaryForPath.find(DbgObjPath);
if (BinIter != BinaryForPath.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered.

Bin = CachedBin->getBinary();
Expected<OwningBinary<Binary>> BinOrErr = createBinary(Path);
if (!BinOrErr) {
BinaryForPath.erase(Pair.first);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered.

Expected<StringRef> NameOrErr = Child.getName();
if (!NameOrErr)
continue;
if (*NameOrErr == sys::path::filename(MemberName)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered...

Comment on lines 680 to 681
size_t CloseParen = Path.rfind(')');
if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not addressed.

if (I != Modules.end()) {
recordAccess(BinaryForPath.find(BinaryName)->second);
auto BinIter = BinaryForPath.find(BinaryName);
if (BinIter != BinaryForPath.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not answered.

@midhuncodes7
Copy link
Contributor Author

midhuncodes7 commented Sep 8, 2025

I have answered the comments. However I'm not sure why it doesn't reflect for you @jh7370 .

image

I'm not sure why it is showing as pending, any idea?

@jh7370
Copy link
Collaborator

jh7370 commented Sep 8, 2025

I'm not sure why it is showing as pending, any idea?

The pending comments means you clicked the button "Start a Review" (which is a good way of grouping all your comments for a single submission), but didn't click the "Submit Review" button under "Review Changes" in the top-right of the "files changed" view of the PR.

@midhuncodes7 midhuncodes7 requested a review from jh7370 September 16, 2025 06:13
Copy link
Collaborator

@jh7370 jh7370 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've taken a step back and given more thought to this feature in general. Here are some points/questions.

Is there an AIX tool that uses this syntax already?

I've also thought of an edge case for existing behaviour that this breaks, since there's technically nothing stopping an input file having a name in the form abc(def). I think we can probably say that this is unlikely enough that we can ignore it, but if there's a way we could still support this case, that would be ideal. My feeling is that you could try one path (e.g. treating it as an explicit filename) and if that doesn't work, fall back to the other path. Alternatively, adding a command-line option might solve this option. However, I could also be persuaded that it's not worth worrying about this case.

Finally, what are you actually trying to achieve with this feature? llvm-symbolizer isn't used for symbolizing relocatable objects, which are generally what are stored in archives, it's for symbolizing executable and shared objects, which are not stored in archives normally.

Expected<StringRef> NameOrErr = Child.getName();
if (!NameOrErr)
continue;
if (*NameOrErr == sys::path::filename(MemberName)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That, to me, isn't valid usage, since %t.o is a full path. It doesn't make any more sense than trying to extract an archive member by name, but using a full path.

This logic implies that the following two cases would result in the same member being used:

llvm-symbolizer ... foo.a(/foo/bar/wobble.o)
llvm-symbolizer ... foo.a(/flob/flab/wobble.o)

Yet the object file might even have come from a completely unrelated path, e.g.

llvm-symbolizer ... foo.a(/baz/wibble/wobble.o)

This doesn't make sense to me. So, why do you want to do it?

@hubert-reinterpretcast
Copy link
Collaborator

hubert-reinterpretcast commented Sep 18, 2025

I've taken a step back and given more thought to this feature in general. Here are some points/questions.

Is there an AIX tool that uses this syntax already?

Yes. This is accepted by the linker for specifying the shared object to expect symbols from at load time (https://www.ibm.com/docs/en/aix/7.3.0?topic=l-ld-command; search for #! path/file (member)). Indeed, this is a standard syntax for specifying archive members to utilities (https://pubs.opengroup.org/onlinepubs/9799919799/utilities/make.html; search lib(member.o)).

I've also thought of an edge case for existing behaviour that this breaks, since there's technically nothing stopping an input file having a name in the form abc(def). I think we can probably say that this is unlikely enough that we can ignore it, but if there's a way we could still support this case, that would be ideal. My feeling is that you could try one path (e.g. treating it as an explicit filename) and if that doesn't work, fall back to the other path. Alternatively, adding a command-line option might solve this option. However, I could also be persuaded that it's not worth worrying about this case.

As the syntax is standard, the same ambiguity exists for POSIX make.

Finally, what are you actually trying to achieve with this feature? llvm-symbolizer isn't used for symbolizing relocatable objects, which are generally what are stored in archives, it's for symbolizing executable and shared objects, which are not stored in archives normally.

Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects.

@jh7370
Copy link
Collaborator

jh7370 commented Sep 19, 2025

I've taken a step back and given more thought to this feature in general. Here are some points/questions.
Is there an AIX tool that uses this syntax already?

Yes. This is accepted by the linker for specifying the shared object to expect symbols from at load time (https://www.ibm.com/docs/en/aix/7.3.0?topic=l-ld-command; search for #! path/file (member)). Indeed, this is a standard syntax for specifying archive members to utilities (https://pubs.opengroup.org/onlinepubs/9799919799/utilities/make.html; search lib(member.o)).

Sounds good, thanks!

I've also thought of an edge case for existing behaviour that this breaks, since there's technically nothing stopping an input file having a name in the form abc(def). I think we can probably say that this is unlikely enough that we can ignore it, but if there's a way we could still support this case, that would be ideal. My feeling is that you could try one path (e.g. treating it as an explicit filename) and if that doesn't work, fall back to the other path. Alternatively, adding a command-line option might solve this option. However, I could also be persuaded that it's not worth worrying about this case.

As the syntax is standard, the same ambiguity exists for POSIX make.

Fair enough.

Finally, what are you actually trying to achieve with this feature? llvm-symbolizer isn't used for symbolizing relocatable objects, which are generally what are stored in archives, it's for symbolizing executable and shared objects, which are not stored in archives normally.

Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects.

Thanks for the info!

@midhuncodes7
Copy link
Contributor Author

That, to me, isn't valid usage, since %t.o is a full path. It doesn't make any more sense than trying to extract an archive member by name, but using a full path.

Removed the system call to extract filename.

@midhuncodes7 midhuncodes7 requested a review from jh7370 September 20, 2025 18:04
Mach-O universal binary or an AIX archive with architecture variants),
symbolize the object file for a given architecture. You can also specify
the architecture by writing ``binary_name:arch_name`` in the input (see
example below). For AIX archives, the format ``archive.a(member.o):arch``
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs here still refer to AIX archives specifically, yet the behaviour change is generic.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects

This behaviour is AIX specific, hence the documentation is more specific to AIX

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Behavior such as having same object name which vary in architecture getting added to archive is specific to AIX

def grp_mach_o : OptionGroup<"kind">,
HelpText<"llvm-symbolizer Mach-O Specific Options">;

def grp_symbolizer : OptionGroup<"Symbolizer Options">;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has this been added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have more generic symbolizer options rather than specific to mach-o

if (I != Modules.end()) {
recordAccess(BinaryForPath.find(BinaryName)->second);
auto BinIter = BinaryForPath.find(BinaryName);
if (BinIter != BinaryForPath.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the new code path that triggers the crash that you're talking about? In other words, why wasn't this a problem before and now is?

Session)) {
Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>());
// Return along the PDB filename to provide more context
// Return along the PDB filename to provide more context.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated change, please revert.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted

Modules.erase(I);
});
auto BinIter = BinaryForPath.find(BinaryName);
if (BinIter != BinaryForPath.end())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, what is the new code path that causes the crash that you've discussed? I've looked at the code and I can't see it: BinaryForPath is populated within getOrCreateObjectPair, which is called earlier in this function.

});
std::string DbgObjPath = DbgObj->getFileName().str();
auto BinIter = BinaryForPath.find(DbgObjPath);
if (BinIter != BinaryForPath.end()) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As in the other comments, please explain in detail from the entry point what the code path is that could hit here without the binary being cached, please, because I don't see it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants