- 
                Notifications
    You must be signed in to change notification settings 
- Fork 15k
[llvm-symbolizer] Recognize AIX big archive #150401
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[llvm-symbolizer] Recognize AIX big archive #150401
Conversation
…des7/llvm-project into midhun7/big-archive-recognition
| @llvm/pr-subscribers-llvm-binary-utilities Author: Midhunesh (midhuncodes7) ChangesThis PR implements big archive recognition by the symbolizer. Patch is 26.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150401.diff 10 Files Affected: 
 diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b2470a83e..8f3a132139fe9 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
 .. option:: --default-arch <arch>
 
   If a binary contains object files for multiple architectures (e.g. it is a
-  Mach-O universal binary), symbolize the object file for a given architecture.
-  You can also specify the architecture by writing ``binary_name:arch_name`` in
-  the input (see example below). If the architecture is not specified in either
-  way, the address will not be symbolized. Defaults to empty string.
+  Mach-O universal binary or an AIX archive with architecture variants),
+  symbolize the object file for a given architecture. You can also specify
+  the architecture by writing ``binary_name:arch_name`` in the input (see
+  example below). For AIX archives, the format ``archive.a(member.o):arch``
+  is also supported. If the architecture is not specified in either way,
+  the address will not be symbolized. Defaults to empty string.
 
   .. code-block:: console
 
     $ cat addr.txt
     /tmp/mach_universal_binary:i386 0x1f84
     /tmp/mach_universal_binary:x86_64 0x100000f24
+    /tmp/archive.a(member.o):ppc 0x1000
+    /tmp/archive.a(member.o):ppc64 0x2000
 
     $ llvm-symbolizer < addr.txt
     _main
@@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
     _main
     /tmp/source_x86_64.cc:8
 
+    _foo
+    /tmp/source_ppc.cc:12
+    
+    _foo
+    /tmp/source_ppc64.cc:12
+
 .. option:: --dsym-hint <path/to/file.dSYM>
 
   If the debug info for a binary isn't present in the default location, look for
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index fb8f3d8af6b1b..5144085f3e23c 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -29,6 +29,12 @@
 #include <utility>
 #include <vector>
 
+#if defined(_AIX)
+#  define SYMBOLIZE_AIX 1
+#else
+#  define SYMBOLIZE_AIX 0
+#endif
+
 namespace llvm {
 namespace object {
 class ELFObjectFileBase;
@@ -202,6 +208,12 @@ class LLVMSymbolizer {
   Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
                                            const std::string &ArchName);
 
+  /// Return a pointer to object file at specified path, for a specified
+  /// architecture that is present inside an archive file
+  Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
+                                                      StringRef MemberName,
+                                                      const std::string &ArchName);   
+
   /// Update the LRU cache order when a binary is accessed.
   void recordAccess(CachedBinary &Bin);
 
@@ -226,6 +238,20 @@ class LLVMSymbolizer {
   std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
       ObjectForUBPathAndArch;
 
+  struct ArchiveCacheKey {
+    std::string ArchivePath;  // Storage for StringRef
+    std::string MemberName;   // Storage for StringRef
+    std::string ArchName;     // Storage for StringRef
+
+    // Required for map comparison
+    bool operator<(const ArchiveCacheKey &Other) const {
+      return std::tie(ArchivePath, MemberName, ArchName) < 
+             std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
+    }
+  };
+
+  std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>> ObjectForArchivePathAndArch;
+  
   Options Opts;
 
   std::unique_ptr<BuildIDFetcher> BIDFetcher;
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 56527719da51f..6dddc3a709239 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Object/Archive.h"
 #include <cassert>
 #include <cstring>
 
@@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
 
 void LLVMSymbolizer::flush() {
   ObjectForUBPathAndArch.clear();
+  ObjectForArchivePathAndArch.clear();
   LRUBinaries.clear();
   CacheSize = 0;
   BinaryForPath.clear();
@@ -321,7 +323,7 @@ bool checkFileCRC(StringRef Path, uint32_t CRCHash) {
 
 bool getGNUDebuglinkContents(const ObjectFile *Obj, std::string &DebugName,
                              uint32_t &CRCHash) {
-  if (!Obj)
+  if (!Obj || !isa<ObjectFile>(Obj))
     return false;
   for (const SectionRef &Section : Obj->sections()) {
     StringRef Name;
@@ -557,19 +559,101 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
   if (!DbgObj)
     DbgObj = Obj;
   ObjectPair Res = std::make_pair(Obj, DbgObj);
-  std::string DbgObjPath = DbgObj->getFileName().str();
   auto Pair =
       ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
-  BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
+  std::string DbgObjPath = DbgObj->getFileName().str();
+  auto BinIter = BinaryForPath.find(DbgObjPath);
+  if (BinIter != BinaryForPath.end()) {
+    BinIter->second.pushEvictor([this, I = Pair.first]() {
     ObjectPairForPathArch.erase(I);
-  });
+    });
+  }
   return Res;
 }
 
+Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(StringRef ArchivePath, 
+                                                                    StringRef MemberName, 
+                                                                    const std::string &ArchName) {
+  Binary *Bin = nullptr;
+  auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>());
+  if (!Pair.second) {
+    Bin = Pair.first->second->getBinary();
+    recordAccess(Pair.first->second);
+  } else {
+    Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath);
+    if (!ArchiveOrErr) {
+      return ArchiveOrErr.takeError();
+    }
+
+    CachedBinary &CachedBin = Pair.first->second;
+    CachedBin = std::move(ArchiveOrErr.get());
+    CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
+    LRUBinaries.push_back(CachedBin);
+    CacheSize += CachedBin.size();
+    Bin = CachedBin->getBinary();
+  }
+
+  if (!Bin || !isa<object::Archive>(Bin))
+    return errorCodeToError(object_error::invalid_file_type);
+
+  object::Archive *Archive = cast<object::Archive>(Bin);
+  Error Err = Error::success();
+  
+  // On AIX, archives can contain multiple members with same name but different types
+  // We need to check all matches and find one that matches both name and architecture
+  for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
+    Expected<StringRef> NameOrErr = Child.getName();
+    if (!NameOrErr)
+      continue; 
+    if (*NameOrErr == llvm::sys::path::filename(MemberName)) {
+      Expected<std::unique_ptr<object::Binary>> MemberOrErr = Child.getAsBinary();
+      if (!MemberOrErr)
+        continue; 
+      
+      std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
+      if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
+#if defined(_AIX)
+        Triple::ArchType ObjArch = Obj->makeTriple().getArch();
+        Triple RequestedTriple;
+        RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
+        if (ObjArch != RequestedTriple.getArch())
+          continue;
+#endif
+        ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(), ArchName};
+        auto I = ObjectForArchivePathAndArch.find(CacheKey);
+        if (I != ObjectForArchivePathAndArch.end())
+          return I->second.get();
+
+        auto CachedObj = std::unique_ptr<ObjectFile>(Obj);
+        auto NewEntry = ObjectForArchivePathAndArch.emplace(
+            CacheKey, std::move(CachedObj));
+        Binary.release();
+        BinaryForPath.find(ArchivePath.str())->second.pushEvictor(
+            [this, Iter = NewEntry.first]() { ObjectForArchivePathAndArch.erase(Iter); });
+        return NewEntry.first->second.get();
+      }
+    }
+  }
+  if (Err)
+    return std::move(Err);
+  return errorCodeToError(object_error::arch_not_found);
+}
+
 Expected<ObjectFile *>
 LLVMSymbolizer::getOrCreateObject(const std::string &Path,
                                   const std::string &ArchName) {
-  Binary *Bin;
+  // First check for archive(member) format - more efficient to check closing paren first
+  size_t CloseParen = Path.rfind(')');
+  if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
+    size_t OpenParen = Path.rfind('(', CloseParen);
+    if (OpenParen != std::string::npos) {
+      StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
+      StringRef MemberName = StringRef(Path).substr(OpenParen + 1, CloseParen - OpenParen - 1);
+      return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
+    }
+  }
+
+  Binary *Bin = nullptr;
   auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
   if (!Pair.second) {
     Bin = Pair.first->second->getBinary();
@@ -648,7 +732,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
 
   auto I = Modules.find(ModuleName);
   if (I != Modules.end()) {
-    recordAccess(BinaryForPath.find(BinaryName)->second);
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end())
+      recordAccess(BinIter->second);
     return I->second.get();
   }
 
@@ -716,7 +802,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
       createModuleInfo(Objects.first, std::move(Context), ModuleName);
   if (ModuleOrErr) {
     auto I = Modules.find(ModuleName);
-    BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end()) 
+      BinIter->second.pushEvictor([this, I]() {
       Modules.erase(I);
     });
   }
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-32.yaml b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
new file mode 100644
index 0000000000000..2080607a1a88c
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
@@ -0,0 +1,119 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1DF
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xA0
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0x64
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x1C
+    Size:            0xC
+    FileOffsetToData: 0x80
+    FileOffsetToRelocations: 0x8C
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000002800000000'
+    Relocations:
+      - Address:         0x1C
+        Symbol:          0x5
+        Info:            0x1F
+        Type:            0x0
+      - Address:         0x20
+        Symbol:          0x9
+        Info:            0x1F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 25
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 3
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            foo
+    Value:           0x1C
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_DS
+        SectionOrLength: 12
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            TOC
+    Value:           0x28
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLength: 0
+        StabInfoIndex:   0
+        StabSectNum:     0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-64.yaml b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
new file mode 100644
index 0000000000000..9bbb1107555e0
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
@@ -0,0 +1,115 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1F7
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xF8
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0xA8
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x20
+    Size:            0x18
+    FileOffsetToData: 0xC4
+    FileOffsetToRelocations: 0xDC
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000000000000000000000380000000000000000'
+    Relocations:
+      - Address:         0x20
+        Symbol:          0x5
+        Info:            0x3F
+        Type:            0x0
+      - Address:         0x28
+        Symbol:          0x9
+        Info:            0x3F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (5ca72bc8d2e87445649eab1825dffd2a047440ba)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 25
+        SectionOrLengthHi: 0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 3
+        SectionOrLengthHi: 0
+  - Name:            foo
+    Value:           0x20
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 3
+        StorageMappingClass: XMC_DS
+        SectionOrLengthLo: 24
+        SectionOrLengthHi: 0
+  - Name:            TOC
+    Value:           0x38
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLengthLo: 0
+        SectionOrLengthHi: 0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
new file mode 100644
index 0000000000000..8e5c929e82878
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
@@ -0,0 +1,68 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_PPC64
+  Flags:           [  ]
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    AddressAlign:    0x10
+    Content:         '2000804E000000000000000000000000'
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         0049424D204F70656E20584C20432F432B2B20666F72204C696E7578206F6E20506F7765722031372E312E322028353732352D4337322C20353736352D4A3230292C2076657273696F6E2031372E312E322E302C20636C616E672076657273696F6E2032312E302E306769742028676974406769746875622E69626D2E636F6D3A636F6D70696C65722F6C6C766D2D70726F6A6563742E67697420653165653233663838333532623937333563363735386661396335653035313366626234393361322900
+  - Name:            .note.GNU-stack
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    AddressAlign:    0x8
+    Content:         1000000000000000017A5200047841011B0C01001000000018000000000000001000000000000000
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x1C
+        Symbol:          .text
+        Type:            R_PPC64_REL32
+  - Name:            .llvm_addrsig
+    Type:            SHT_LLVM_ADDRSIG
+    Flags:           [ SHF_EXCLUDE ]
+    Link:            .symtab
+    AddressAlign:    0x1
+    Offset:          0x1B8
+    Symbols:         [  ]
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .text
+      - Name:            .comment
+      - Name:            .note.GNU-stack
+      - Name:            .eh_frame
+      - Name:            .rela.eh_frame
+      - Name:            .llvm_addrsig
+      - Name:            .symtab
+Symbols:
+  - Name:            foo1.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .text
+    Type:            STT_SECTION
+    Section:         .text
+  - Name:            foo1
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Size:            0x10
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml
new file mode 100644
index 00000000...
[truncated]
 | 
| @llvm/pr-subscribers-debuginfo Author: Midhunesh (midhuncodes7) ChangesThis PR implements big archive recognition by the symbolizer. Patch is 26.81 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/150401.diff 10 Files Affected: 
 diff --git a/llvm/docs/CommandGuide/llvm-symbolizer.rst b/llvm/docs/CommandGuide/llvm-symbolizer.rst
index 2da1b2470a83e..8f3a132139fe9 100644
--- a/llvm/docs/CommandGuide/llvm-symbolizer.rst
+++ b/llvm/docs/CommandGuide/llvm-symbolizer.rst
@@ -535,16 +535,20 @@ MACH-O SPECIFIC OPTIONS
 .. option:: --default-arch <arch>
 
   If a binary contains object files for multiple architectures (e.g. it is a
-  Mach-O universal binary), symbolize the object file for a given architecture.
-  You can also specify the architecture by writing ``binary_name:arch_name`` in
-  the input (see example below). If the architecture is not specified in either
-  way, the address will not be symbolized. Defaults to empty string.
+  Mach-O universal binary or an AIX archive with architecture variants),
+  symbolize the object file for a given architecture. You can also specify
+  the architecture by writing ``binary_name:arch_name`` in the input (see
+  example below). For AIX archives, the format ``archive.a(member.o):arch``
+  is also supported. If the architecture is not specified in either way,
+  the address will not be symbolized. Defaults to empty string.
 
   .. code-block:: console
 
     $ cat addr.txt
     /tmp/mach_universal_binary:i386 0x1f84
     /tmp/mach_universal_binary:x86_64 0x100000f24
+    /tmp/archive.a(member.o):ppc 0x1000
+    /tmp/archive.a(member.o):ppc64 0x2000
 
     $ llvm-symbolizer < addr.txt
     _main
@@ -553,6 +557,12 @@ MACH-O SPECIFIC OPTIONS
     _main
     /tmp/source_x86_64.cc:8
 
+    _foo
+    /tmp/source_ppc.cc:12
+    
+    _foo
+    /tmp/source_ppc64.cc:12
+
 .. option:: --dsym-hint <path/to/file.dSYM>
 
   If the debug info for a binary isn't present in the default location, look for
diff --git a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
index fb8f3d8af6b1b..5144085f3e23c 100644
--- a/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
+++ b/llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h
@@ -29,6 +29,12 @@
 #include <utility>
 #include <vector>
 
+#if defined(_AIX)
+#  define SYMBOLIZE_AIX 1
+#else
+#  define SYMBOLIZE_AIX 0
+#endif
+
 namespace llvm {
 namespace object {
 class ELFObjectFileBase;
@@ -202,6 +208,12 @@ class LLVMSymbolizer {
   Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
                                            const std::string &ArchName);
 
+  /// Return a pointer to object file at specified path, for a specified
+  /// architecture that is present inside an archive file
+  Expected<ObjectFile *> getOrCreateObjectFromArchive(StringRef ArchivePath,
+                                                      StringRef MemberName,
+                                                      const std::string &ArchName);   
+
   /// Update the LRU cache order when a binary is accessed.
   void recordAccess(CachedBinary &Bin);
 
@@ -226,6 +238,20 @@ class LLVMSymbolizer {
   std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>>
       ObjectForUBPathAndArch;
 
+  struct ArchiveCacheKey {
+    std::string ArchivePath;  // Storage for StringRef
+    std::string MemberName;   // Storage for StringRef
+    std::string ArchName;     // Storage for StringRef
+
+    // Required for map comparison
+    bool operator<(const ArchiveCacheKey &Other) const {
+      return std::tie(ArchivePath, MemberName, ArchName) < 
+             std::tie(Other.ArchivePath, Other.MemberName, Other.ArchName);
+    }
+  };
+
+  std::map<ArchiveCacheKey, std::unique_ptr<ObjectFile>> ObjectForArchivePathAndArch;
+  
   Options Opts;
 
   std::unique_ptr<BuildIDFetcher> BIDFetcher;
diff --git a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
index 56527719da51f..6dddc3a709239 100644
--- a/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
+++ b/llvm/lib/DebugInfo/Symbolize/Symbolize.cpp
@@ -33,6 +33,7 @@
 #include "llvm/Support/FileSystem.h"
 #include "llvm/Support/MemoryBuffer.h"
 #include "llvm/Support/Path.h"
+#include "llvm/Object/Archive.h"
 #include <cassert>
 #include <cstring>
 
@@ -286,6 +287,7 @@ LLVMSymbolizer::findSymbol(ArrayRef<uint8_t> BuildID, StringRef Symbol,
 
 void LLVMSymbolizer::flush() {
   ObjectForUBPathAndArch.clear();
+  ObjectForArchivePathAndArch.clear();
   LRUBinaries.clear();
   CacheSize = 0;
   BinaryForPath.clear();
@@ -321,7 +323,7 @@ bool checkFileCRC(StringRef Path, uint32_t CRCHash) {
 
 bool getGNUDebuglinkContents(const ObjectFile *Obj, std::string &DebugName,
                              uint32_t &CRCHash) {
-  if (!Obj)
+  if (!Obj || !isa<ObjectFile>(Obj))
     return false;
   for (const SectionRef &Section : Obj->sections()) {
     StringRef Name;
@@ -557,19 +559,101 @@ LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
   if (!DbgObj)
     DbgObj = Obj;
   ObjectPair Res = std::make_pair(Obj, DbgObj);
-  std::string DbgObjPath = DbgObj->getFileName().str();
   auto Pair =
       ObjectPairForPathArch.emplace(std::make_pair(Path, ArchName), Res);
-  BinaryForPath.find(DbgObjPath)->second.pushEvictor([this, I = Pair.first]() {
+  std::string DbgObjPath = DbgObj->getFileName().str();
+  auto BinIter = BinaryForPath.find(DbgObjPath);
+  if (BinIter != BinaryForPath.end()) {
+    BinIter->second.pushEvictor([this, I = Pair.first]() {
     ObjectPairForPathArch.erase(I);
-  });
+    });
+  }
   return Res;
 }
 
+Expected<ObjectFile *> LLVMSymbolizer::getOrCreateObjectFromArchive(StringRef ArchivePath, 
+                                                                    StringRef MemberName, 
+                                                                    const std::string &ArchName) {
+  Binary *Bin = nullptr;
+  auto Pair = BinaryForPath.emplace(ArchivePath.str(), OwningBinary<Binary>());
+  if (!Pair.second) {
+    Bin = Pair.first->second->getBinary();
+    recordAccess(Pair.first->second);
+  } else {
+    Expected<OwningBinary<Binary>> ArchiveOrErr = createBinary(ArchivePath);
+    if (!ArchiveOrErr) {
+      return ArchiveOrErr.takeError();
+    }
+
+    CachedBinary &CachedBin = Pair.first->second;
+    CachedBin = std::move(ArchiveOrErr.get());
+    CachedBin.pushEvictor([this, I = Pair.first]() { BinaryForPath.erase(I); });
+    LRUBinaries.push_back(CachedBin);
+    CacheSize += CachedBin.size();
+    Bin = CachedBin->getBinary();
+  }
+
+  if (!Bin || !isa<object::Archive>(Bin))
+    return errorCodeToError(object_error::invalid_file_type);
+
+  object::Archive *Archive = cast<object::Archive>(Bin);
+  Error Err = Error::success();
+  
+  // On AIX, archives can contain multiple members with same name but different types
+  // We need to check all matches and find one that matches both name and architecture
+  for (auto &Child : Archive->children(Err, /*SkipInternal=*/true)) {
+    Expected<StringRef> NameOrErr = Child.getName();
+    if (!NameOrErr)
+      continue; 
+    if (*NameOrErr == llvm::sys::path::filename(MemberName)) {
+      Expected<std::unique_ptr<object::Binary>> MemberOrErr = Child.getAsBinary();
+      if (!MemberOrErr)
+        continue; 
+      
+      std::unique_ptr<object::Binary> Binary = std::move(*MemberOrErr);
+      if (auto *Obj = dyn_cast<object::ObjectFile>(Binary.get())) {
+#if defined(_AIX)
+        Triple::ArchType ObjArch = Obj->makeTriple().getArch();
+        Triple RequestedTriple;
+        RequestedTriple.setArch(Triple::getArchTypeForLLVMName(ArchName));
+        if (ObjArch != RequestedTriple.getArch())
+          continue;
+#endif
+        ArchiveCacheKey CacheKey{ArchivePath.str(), MemberName.str(), ArchName};
+        auto I = ObjectForArchivePathAndArch.find(CacheKey);
+        if (I != ObjectForArchivePathAndArch.end())
+          return I->second.get();
+
+        auto CachedObj = std::unique_ptr<ObjectFile>(Obj);
+        auto NewEntry = ObjectForArchivePathAndArch.emplace(
+            CacheKey, std::move(CachedObj));
+        Binary.release();
+        BinaryForPath.find(ArchivePath.str())->second.pushEvictor(
+            [this, Iter = NewEntry.first]() { ObjectForArchivePathAndArch.erase(Iter); });
+        return NewEntry.first->second.get();
+      }
+    }
+  }
+  if (Err)
+    return std::move(Err);
+  return errorCodeToError(object_error::arch_not_found);
+}
+
 Expected<ObjectFile *>
 LLVMSymbolizer::getOrCreateObject(const std::string &Path,
                                   const std::string &ArchName) {
-  Binary *Bin;
+  // First check for archive(member) format - more efficient to check closing paren first
+  size_t CloseParen = Path.rfind(')');
+  if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) {
+    size_t OpenParen = Path.rfind('(', CloseParen);
+    if (OpenParen != std::string::npos) {
+      StringRef ArchivePath = StringRef(Path).substr(0, OpenParen);
+      StringRef MemberName = StringRef(Path).substr(OpenParen + 1, CloseParen - OpenParen - 1);
+      return getOrCreateObjectFromArchive(ArchivePath, MemberName, ArchName);
+    }
+  }
+
+  Binary *Bin = nullptr;
   auto Pair = BinaryForPath.emplace(Path, OwningBinary<Binary>());
   if (!Pair.second) {
     Bin = Pair.first->second->getBinary();
@@ -648,7 +732,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
 
   auto I = Modules.find(ModuleName);
   if (I != Modules.end()) {
-    recordAccess(BinaryForPath.find(BinaryName)->second);
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end())
+      recordAccess(BinIter->second);
     return I->second.get();
   }
 
@@ -716,7 +802,9 @@ LLVMSymbolizer::getOrCreateModuleInfo(StringRef ModuleName) {
       createModuleInfo(Objects.first, std::move(Context), ModuleName);
   if (ModuleOrErr) {
     auto I = Modules.find(ModuleName);
-    BinaryForPath.find(BinaryName)->second.pushEvictor([this, I]() {
+    auto BinIter = BinaryForPath.find(BinaryName);
+    if (BinIter != BinaryForPath.end()) 
+      BinIter->second.pushEvictor([this, I]() {
       Modules.erase(I);
     });
   }
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-32.yaml b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
new file mode 100644
index 0000000000000..2080607a1a88c
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-32.yaml
@@ -0,0 +1,119 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1DF
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xA0
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0x64
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x1C
+    Size:            0xC
+    FileOffsetToData: 0x80
+    FileOffsetToRelocations: 0x8C
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000002800000000'
+    Relocations:
+      - Address:         0x1C
+        Symbol:          0x5
+        Info:            0x1F
+        Type:            0x0
+      - Address:         0x20
+        Symbol:          0x9
+        Info:            0x1F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (145c02cece3630765e6412e6820bc446ddb4e138)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 25
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLength: 3
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            foo
+    Value:           0x1C
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_DS
+        SectionOrLength: 12
+        StabInfoIndex:   0
+        StabSectNum:     0
+  - Name:            TOC
+    Value:           0x28
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLength: 0
+        StabInfoIndex:   0
+        StabSectNum:     0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-64.yaml b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
new file mode 100644
index 0000000000000..9bbb1107555e0
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-64.yaml
@@ -0,0 +1,115 @@
+--- !XCOFF
+FileHeader:
+  MagicNumber:     0x1F7
+  NumberOfSections: 2
+  CreationTime:    0
+  OffsetToSymbolTable: 0xF8
+  EntriesInSymbolTable: 11
+  AuxiliaryHeaderSize: 0
+  Flags:           0x0
+Sections:
+  - Name:            .text
+    Address:         0x0
+    Size:            0x1C
+    FileOffsetToData: 0xA8
+    FileOffsetToRelocations: 0x0
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x0
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_TEXT ]
+    SectionData:     4E800020000000000009204000000001000000040003666F6F000000
+  - Name:            .data
+    Address:         0x20
+    Size:            0x18
+    FileOffsetToData: 0xC4
+    FileOffsetToRelocations: 0xDC
+    FileOffsetToLineNumbers: 0x0
+    NumberOfRelocations: 0x2
+    NumberOfLineNumbers: 0x0
+    Flags:           [ STYP_DATA ]
+    SectionData:     '000000000000000000000000000000380000000000000000'
+    Relocations:
+      - Address:         0x20
+        Symbol:          0x5
+        Info:            0x3F
+        Type:            0x0
+      - Address:         0x28
+        Symbol:          0x9
+        Info:            0x3F
+        Type:            0x0
+Symbols:
+  - Name:            .file
+    Value:           0x0
+    Section:         N_DEBUG
+    Type:            0x18
+    StorageClass:    C_FILE
+    NumberOfAuxEntries: 2
+    AuxEntries:
+      - Type:            AUX_FILE
+        FileNameOrString: foo.c
+        FileStringType:  XFT_FN
+      - Type:            AUX_FILE
+        FileNameOrString: 'IBM Open XL C/C++ for AIX 17.1.3 (5725-C72, 5765-J18), version 17.1.3.0, LLVM version 21.0.0git (5ca72bc8d2e87445649eab1825dffd2a047440ba)'
+        FileStringType:  XFT_CV
+  - Name:            ''
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 5
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 25
+        SectionOrLengthHi: 0
+  - Name:            .foo
+    Value:           0x0
+    Section:         .text
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_LD
+        SymbolAlignment: 0
+        StorageMappingClass: XMC_PR
+        SectionOrLengthLo: 3
+        SectionOrLengthHi: 0
+  - Name:            foo
+    Value:           0x20
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_EXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 3
+        StorageMappingClass: XMC_DS
+        SectionOrLengthLo: 24
+        SectionOrLengthHi: 0
+  - Name:            TOC
+    Value:           0x38
+    Section:         .data
+    Type:            0x0
+    StorageClass:    C_HIDEXT
+    NumberOfAuxEntries: 1
+    AuxEntries:
+      - Type:            AUX_CSECT
+        ParameterHashIndex: 0
+        TypeChkSectNum:  0
+        SymbolType:      XTY_SD
+        SymbolAlignment: 2
+        StorageMappingClass: XMC_TC0
+        SectionOrLengthLo: 0
+        SectionOrLengthHi: 0
+StringTable:     {}
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
new file mode 100644
index 0000000000000..8e5c929e82878
--- /dev/null
+++ b/llvm/test/DebugInfo/Inputs/big-archive-elf-1.yaml
@@ -0,0 +1,68 @@
+--- !ELF
+FileHeader:
+  Class:           ELFCLASS64
+  Data:            ELFDATA2LSB
+  Type:            ET_REL
+  Machine:         EM_PPC64
+  Flags:           [  ]
+  SectionHeaderStringTable: .strtab
+Sections:
+  - Name:            .text
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC, SHF_EXECINSTR ]
+    AddressAlign:    0x10
+    Content:         '2000804E000000000000000000000000'
+  - Name:            .comment
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_MERGE, SHF_STRINGS ]
+    AddressAlign:    0x1
+    EntSize:         0x1
+    Content:         0049424D204F70656E20584C20432F432B2B20666F72204C696E7578206F6E20506F7765722031372E312E322028353732352D4337322C20353736352D4A3230292C2076657273696F6E2031372E312E322E302C20636C616E672076657273696F6E2032312E302E306769742028676974406769746875622E69626D2E636F6D3A636F6D70696C65722F6C6C766D2D70726F6A6563742E67697420653165653233663838333532623937333563363735386661396335653035313366626234393361322900
+  - Name:            .note.GNU-stack
+    Type:            SHT_PROGBITS
+    AddressAlign:    0x1
+  - Name:            .eh_frame
+    Type:            SHT_PROGBITS
+    Flags:           [ SHF_ALLOC ]
+    AddressAlign:    0x8
+    Content:         1000000000000000017A5200047841011B0C01001000000018000000000000001000000000000000
+  - Name:            .rela.eh_frame
+    Type:            SHT_RELA
+    Flags:           [ SHF_INFO_LINK ]
+    Link:            .symtab
+    AddressAlign:    0x8
+    Info:            .eh_frame
+    Relocations:
+      - Offset:          0x1C
+        Symbol:          .text
+        Type:            R_PPC64_REL32
+  - Name:            .llvm_addrsig
+    Type:            SHT_LLVM_ADDRSIG
+    Flags:           [ SHF_EXCLUDE ]
+    Link:            .symtab
+    AddressAlign:    0x1
+    Offset:          0x1B8
+    Symbols:         [  ]
+  - Type:            SectionHeaderTable
+    Sections:
+      - Name:            .strtab
+      - Name:            .text
+      - Name:            .comment
+      - Name:            .note.GNU-stack
+      - Name:            .eh_frame
+      - Name:            .rela.eh_frame
+      - Name:            .llvm_addrsig
+      - Name:            .symtab
+Symbols:
+  - Name:            foo1.c
+    Type:            STT_FILE
+    Index:           SHN_ABS
+  - Name:            .text
+    Type:            STT_SECTION
+    Section:         .text
+  - Name:            foo1
+    Type:            STT_FUNC
+    Section:         .text
+    Binding:         STB_GLOBAL
+    Size:            0x10
+...
diff --git a/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml b/llvm/test/DebugInfo/Inputs/big-archive-elf-2.yaml
new file mode 100644
index 00000000...
[truncated]
 | 
| ✅ With the latest revision this PR passed the C/C++ code formatter. | 
| LLVMSymbolizer::getOrCreateObject(const std::string &Path, | ||
| const std::string &ArchName) { | ||
| Binary *Bin; | ||
| // First check for archive(member) format - more efficient to check closing paren first | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // First check for archive(member) format - more efficient to check closing paren first | |
| // First check for archive(member) format - more efficient to check closing paren first. | 
Nit.
This behaviour change isn't guarded to Big Archive format as stated in the documentation. It looks to me like a UNIX archive would be impacted by this too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping. The second part of this comment hasn't been addressed as far as I can tell.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we intend to allow naming archive members in non-AIX contexts (as that may be helpful). The documentation will be made more generic like how it used to be before.
…des7/llvm-project into midhun7/big-archive-recognition
| I'm still catching up on reviews left over from my time off last week. I'll be taking a look again at this hopefully tomorrow. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please don't mark conversation threads that I have initiated as resolved. I need the "resolved" status to help me spot things I have and have not checked, because it's not unheard of for people to mark things as resolved without actually addressing them. You can find more context on https://discourse.llvm.org/t/rfc-github-pr-resolve-conversation-button/73178.
| Have you forgotten to push? I made some comments that you've responded to and others that you haven't, but you've not pushed any new commits since. | 
| Consider title like  | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @midhuncodes7,
It's generally expected that ALL reviewer comments are responded to, either via code changes or comments in return, when updating a patch (see https://llvm.org/docs/CodeReview.html#acknowledge-all-reviewer-feedback). In general, there's no need to push updates until you've addressed everything and answered any comments. However, if you do want to push, you should make it clear that you have more changes incoming, either by putting the PR in draft state, or by simply saying as much in the comments. This is important, because otherwise it wastes reviewer time as they review a piece of work that hasn't been finished.
| @@ -0,0 +1,23 @@ | |||
| // Test archive member recognition by name (ELF format) | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test comments throughout still need trailing ".".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
| }); | ||
| std::string DbgObjPath = DbgObj->getFileName().str(); | ||
| auto BinIter = BinaryForPath.find(DbgObjPath); | ||
| if (BinIter != BinaryForPath.end()) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not answered.
| Bin = CachedBin->getBinary(); | ||
| Expected<OwningBinary<Binary>> BinOrErr = createBinary(Path); | ||
| if (!BinOrErr) { | ||
| BinaryForPath.erase(Pair.first); | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not answered.
| Expected<StringRef> NameOrErr = Child.getName(); | ||
| if (!NameOrErr) | ||
| continue; | ||
| if (*NameOrErr == sys::path::filename(MemberName)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not answered...
| size_t CloseParen = Path.rfind(')'); | ||
| if (CloseParen != std::string::npos && CloseParen == Path.length() - 1) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not addressed.
| if (I != Modules.end()) { | ||
| recordAccess(BinaryForPath.find(BinaryName)->second); | ||
| auto BinIter = BinaryForPath.find(BinaryName); | ||
| if (BinIter != BinaryForPath.end()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not answered.
| I have answered the comments. However I'm not sure why it doesn't reflect for you @jh7370 .   I'm not sure why it is showing as pending, any idea? | 
| 
 The pending comments means you clicked the button "Start a Review" (which is a good way of grouping all your comments for a single submission), but didn't click the "Submit Review" button under "Review Changes" in the top-right of the "files changed" view of the PR. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've taken a step back and given more thought to this feature in general. Here are some points/questions.
Is there an AIX tool that uses this syntax already?
I've also thought of an edge case for existing behaviour that this breaks, since there's technically nothing stopping an input file having a name in the form abc(def). I think we can probably say that this is unlikely enough that we can ignore it, but if there's a way we could still support this case, that would be ideal. My feeling is that you could try one path (e.g. treating it as an explicit filename) and if that doesn't work, fall back to the other path. Alternatively, adding a command-line option might solve this option. However, I could also be persuaded that it's not worth worrying about this case.
Finally, what are you actually trying to achieve with this feature? llvm-symbolizer isn't used for symbolizing relocatable objects, which are generally what are stored in archives, it's for symbolizing executable and shared objects, which are not stored in archives normally.
| Expected<StringRef> NameOrErr = Child.getName(); | ||
| if (!NameOrErr) | ||
| continue; | ||
| if (*NameOrErr == sys::path::filename(MemberName)) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That, to me, isn't valid usage, since %t.o is a full path. It doesn't make any more sense than trying to extract an archive member by name, but using a full path.
This logic implies that the following two cases would result in the same member being used:
llvm-symbolizer ... foo.a(/foo/bar/wobble.o)
llvm-symbolizer ... foo.a(/flob/flab/wobble.o)
Yet the object file might even have come from a completely unrelated path, e.g.
llvm-symbolizer ... foo.a(/baz/wibble/wobble.o)
This doesn't make sense to me. So, why do you want to do it?
| 
 Yes. This is accepted by the linker for specifying the shared object to expect symbols from at load time (https://www.ibm.com/docs/en/aix/7.3.0?topic=l-ld-command; search for  
 As the syntax is standard, the same ambiguity exists for POSIX  
 Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects. | 
| 
 Sounds good, thanks! 
 Fair enough. 
 Thanks for the info! | 
| 
 Removed the system call to extract filename. | 
| Mach-O universal binary or an AIX archive with architecture variants), | ||
| symbolize the object file for a given architecture. You can also specify | ||
| the architecture by writing ``binary_name:arch_name`` in the input (see | ||
| example below). For AIX archives, the format ``archive.a(member.o):arch`` | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The docs here still refer to AIX archives specifically, yet the behaviour change is generic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shared objects are normally stored in (and loaded from) archives on AIX. By convention, the same (big format) archive contains both 32-bit and 64-bit objects
This behaviour is AIX specific, hence the documentation is more specific to AIX
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Behavior such as having same object name which vary in architecture getting added to archive is specific to AIX
| def grp_mach_o : OptionGroup<"kind">, | ||
| HelpText<"llvm-symbolizer Mach-O Specific Options">; | ||
|  | ||
| def grp_symbolizer : OptionGroup<"Symbolizer Options">; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has this been added?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To have more generic symbolizer options rather than specific to mach-o
| if (I != Modules.end()) { | ||
| recordAccess(BinaryForPath.find(BinaryName)->second); | ||
| auto BinIter = BinaryForPath.find(BinaryName); | ||
| if (BinIter != BinaryForPath.end()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the new code path that triggers the crash that you're talking about? In other words, why wasn't this a problem before and now is?
| Session)) { | ||
| Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>()); | ||
| // Return along the PDB filename to provide more context | ||
| // Return along the PDB filename to provide more context. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated change, please revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reverted
| Modules.erase(I); | ||
| }); | ||
| auto BinIter = BinaryForPath.find(BinaryName); | ||
| if (BinIter != BinaryForPath.end()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, what is the new code path that causes the crash that you've discussed? I've looked at the code and I can't see it: BinaryForPath is populated within getOrCreateObjectPair, which is called earlier in this function.
| }); | ||
| std::string DbgObjPath = DbgObj->getFileName().str(); | ||
| auto BinIter = BinaryForPath.find(DbgObjPath); | ||
| if (BinIter != BinaryForPath.end()) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in the other comments, please explain in detail from the entry point what the code path is that could hit here without the binary being cached, please, because I don't see it.
| @jh7370 any updates from your end? | 
| 
 Sorry, I've had some time off and have been unwell too. I'm starting to catch up on things now, but it'll likely be next week before I can take a look. | 
| ArchivePath.str().c_str()); | ||
|  | ||
| Error Err = Error::success(); | ||
| // On AIX, archives can contain multiple members with the same name but | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point: it's quite straightforward to create an archive with multiple members of the same name, using llvm-ar q. I don't think this impacts things here specifically though.
| std::map<std::pair<std::string, std::string>, std::unique_ptr<ObjectFile>> | ||
| ObjectForUBPathAndArch; | ||
| struct ArchiveCacheKey { | ||
| std::string ArchivePath; | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: references to Archive in the names here are misleading, because they might be referring to a Mach-O universal binary, if I'm not mistaken.
| }); | ||
| std::string DbgObjPath = DbgObj->getFileName().str(); | ||
| auto BinIter = BinaryForPath.find(DbgObjPath); | ||
| if (BinIter != BinaryForPath.end()) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've given it some more thought and I'm fairly convinced the problem is that you're storing the wrong thing in BinaryForPath when dealing with archive members. Specifically, this should store the final binary (i.e. the archive child binary) and should be keyed on t.a(t.o) (or equivalent).
This will require some reorganisation of the code from what you've done, so that loadOrGetBinary knows that it's expecting an archive and fetches the child out of that.
Does this make sense?
This PR implements big archive recognition by the symbolizer.
The archive input format should be in archive.a(member.o) format