Modify llvm-dwp to be able to emit string tables over 4GB without losing data #167457

clayborg · 2025-11-11T05:29:15Z

We can change llvm-dwp to emit DWARF64 version of the .debug_str_offsets tables for .dwo files in a .dwp file. This allows the string table to exceed 4GB without truncating string offsets into the .debug_str section and losing data. llvm-dwp will append all strings to the .debug_str section for a .dwo file, and if any of the new string offsets exceed UINT32_MAX, it will upgrade the .debug_str_offsets table to a DWARF64 header and then each string offset in that table can now have a 64 bit offset.

Fixed LLDB to be able to successfully load the 64 bit string tables in .dwp files.

Fixed llvm-dwarfdump and LLVM DWARF parsing code to do the right thing with DWARF64 string table headers.

clayborg · 2025-11-11T05:29:37Z

Created the pull request prior to adding testing to get comments on this.

llvmbot · 2025-11-11T05:29:50Z

@llvm/pr-subscribers-debuginfo

Author: Greg Clayton (clayborg)

Changes

We can change llvm-dwp to emit DWARF64 version of the .debug_str_offsets tables for .dwo files in a .dwp file. This allows the string table to exceed 4GB without truncating string offsets into the .debug_str section and losing data. llvm-dwp will append all strings to the .debug_str section for a .dwo file, and if any of the new string offsets exceed UINT32_MAX, it will upgrade the .debug_str_offsets table to a DWARF64 header and then each string offset in that table can now have a 64 bit offset.

Fixed LLDB to be able to successfully load the 64 bit string tables in .dwp files.

Fixed llvm-dwarfdump and LLVM DWARF parsing code to do the right thing with DWARF64 string table headers.

Full diff: https://github.com/llvm/llvm-project/pull/167457.diff

6 Files Affected:

(modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp (+9-4)
(modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h (+1-1)
(modified) llvm/include/llvm/DWP/DWP.h (+3-1)
(modified) llvm/include/llvm/DWP/DWPStringPool.h (+3-3)
(modified) llvm/lib/DWP/DWP.cpp (+62-16)
(modified) llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp (+11-2)

diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
index 94fc2e83e899d..7b7864caf8c09 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
@@ -360,8 +360,10 @@ void DWARFUnit::SetDwoStrOffsetsBase() {
     const DWARFDataExtractor &strOffsets =
         GetSymbolFileDWARF().GetDWARFContext().getOrLoadStrOffsetsData();
     uint64_t length = strOffsets.GetU32(&baseOffset);
-    if (length == 0xffffffff)
+    if (length == 0xffffffff) {
       length = strOffsets.GetU64(&baseOffset);
+      m_str_offsets_size = 8;
+    }
 
     // Check version.
     if (strOffsets.GetU16(&baseOffset) < 5)
@@ -369,6 +371,10 @@ void DWARFUnit::SetDwoStrOffsetsBase() {
 
     // Skip padding.
     baseOffset += 2;
+  } else {
+    // Size of offset for .debug_str_offsets is same as DWARF offset byte size
+    // of the DWARFUnit for DWARF version 4 and earlier.
+    m_str_offsets_size = m_header.getDwarfOffsetByteSize();
   }
 
   SetStrOffsetsBase(baseOffset);
@@ -1079,10 +1085,9 @@ uint32_t DWARFUnit::GetHeaderByteSize() const { return m_header.getSize(); }
 
 std::optional<uint64_t>
 DWARFUnit::GetStringOffsetSectionItem(uint32_t index) const {
-  lldb::offset_t offset =
-      GetStrOffsetsBase() + index * m_header.getDwarfOffsetByteSize();
+  lldb::offset_t offset = GetStrOffsetsBase() + index * m_str_offsets_size;
   return m_dwarf.GetDWARFContext().getOrLoadStrOffsetsData().GetMaxU64(
-      &offset, m_header.getDwarfOffsetByteSize());
+      &offset, m_str_offsets_size);
 }
 
 llvm::Expected<llvm::DWARFAddressRangesVector>
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
index 91a693860c55a..856db5e4101cd 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
@@ -364,7 +364,7 @@ class DWARFUnit : public DWARFExpression::Delegate, public UserID {
   dw_offset_t m_line_table_offset = DW_INVALID_OFFSET;
 
   dw_offset_t m_str_offsets_base = 0; // Value of DW_AT_str_offsets_base.
-
+  dw_offset_t m_str_offsets_size = 4; // Size in bytes of the string offsets.
   std::optional<llvm::DWARFDebugRnglistTable> m_rnglist_table;
   bool m_rnglist_table_done = false;
   std::optional<llvm::DWARFListTableHeader> m_loclist_table_header;
diff --git a/llvm/include/llvm/DWP/DWP.h b/llvm/include/llvm/DWP/DWP.h
index a759bae10d160..cc38369658eaa 100644
--- a/llvm/include/llvm/DWP/DWP.h
+++ b/llvm/include/llvm/DWP/DWP.h
@@ -70,6 +70,8 @@ struct CompileUnitIdentifiers {
 LLVM_ABI Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
                      OnCuIndexOverflow OverflowOptValue);
 
+typedef std::vector<std::pair<DWARFSectionKind, uint32_t>> SectionLengths;
+
 LLVM_ABI Error handleSection(
     const StringMap<std::pair<MCSection *, DWARFSectionKind>> &KnownSections,
     const MCSection *StrSection, const MCSection *StrOffsetSection,
@@ -82,7 +84,7 @@ LLVM_ABI Error handleSection(
     std::vector<StringRef> &CurTypesSection,
     std::vector<StringRef> &CurInfoSection, StringRef &AbbrevSection,
     StringRef &CurCUIndexSection, StringRef &CurTUIndexSection,
-    std::vector<std::pair<DWARFSectionKind, uint32_t>> &SectionLength);
+    SectionLengths &SectionLength);
 
 LLVM_ABI Expected<InfoSectionUnitHeader>
 parseInfoSectionUnitHeader(StringRef Info);
diff --git a/llvm/include/llvm/DWP/DWPStringPool.h b/llvm/include/llvm/DWP/DWPStringPool.h
index 1354b46f156b6..d1486ff7872e1 100644
--- a/llvm/include/llvm/DWP/DWPStringPool.h
+++ b/llvm/include/llvm/DWP/DWPStringPool.h
@@ -32,13 +32,13 @@ class DWPStringPool {
 
   MCStreamer &Out;
   MCSection *Sec;
-  DenseMap<const char *, uint32_t, CStrDenseMapInfo> Pool;
-  uint32_t Offset = 0;
+  DenseMap<const char *, uint64_t, CStrDenseMapInfo> Pool;
+  uint64_t Offset = 0;
 
 public:
   DWPStringPool(MCStreamer &Out, MCSection *Sec) : Out(Out), Sec(Sec) {}
 
-  uint32_t getOffset(const char *Str, unsigned Length) {
+  uint64_t getOffset(const char *Str, unsigned Length) {
     assert(strlen(Str) + 1 == Length && "Ensure length hint is correct");
 
     auto Pair = Pool.insert(std::make_pair(Str, Offset));
diff --git a/llvm/lib/DWP/DWP.cpp b/llvm/lib/DWP/DWP.cpp
index b565edbfe96db..54edce81208b5 100644
--- a/llvm/lib/DWP/DWP.cpp
+++ b/llvm/lib/DWP/DWP.cpp
@@ -413,33 +413,43 @@ Expected<InfoSectionUnitHeader> parseInfoSectionUnitHeader(StringRef Info) {
 }
 
 static void writeNewOffsetsTo(MCStreamer &Out, DataExtractor &Data,
-                              DenseMap<uint64_t, uint32_t> &OffsetRemapping,
-                              uint64_t &Offset, uint64_t &Size) {
+                              DenseMap<uint64_t, uint64_t> &OffsetRemapping,
+                              uint64_t &Offset, const uint64_t Size,
+                              uint32_t OldOffsetSize, uint32_t NewOffsetSize) {
 
   while (Offset < Size) {
-    auto OldOffset = Data.getU32(&Offset);
-    auto NewOffset = OffsetRemapping[OldOffset];
-    Out.emitIntValue(NewOffset, 4);
+    const uint64_t OldOffset = Data.getUnsigned(&Offset, OldOffsetSize);
+    const uint64_t NewOffset = OffsetRemapping[OldOffset];
+    assert(NewOffsetSize == 8 || NewOffset <= UINT32_MAX);
+    Out.emitIntValue(NewOffset, NewOffsetSize);
   }
 }
 
 void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
                             MCSection *StrOffsetSection,
                             StringRef CurStrSection,
-                            StringRef CurStrOffsetSection, uint16_t Version) {
+                            StringRef CurStrOffsetSection, uint16_t Version,
+                            SectionLengths &SectionLength) {
   // Could possibly produce an error or warning if one of these was non-null but
   // the other was null.
   if (CurStrSection.empty() || CurStrOffsetSection.empty())
     return;
 
-  DenseMap<uint64_t, uint32_t> OffsetRemapping;
+  DenseMap<uint64_t, uint64_t> OffsetRemapping;
 
   DataExtractor Data(CurStrSection, true, 0);
   uint64_t LocalOffset = 0;
   uint64_t PrevOffset = 0;
+
+  // Keep track if any new string offsets exceed UINT32_MAX. If any do, we can
+  // emit a DWARF64 .debug_str_offsets table for this compile unit.
+  uint32_t OldOffsetSize = 4;
+  uint32_t NewOffsetSize = 4;
   while (const char *S = Data.getCStr(&LocalOffset)) {
-    OffsetRemapping[PrevOffset] =
-        Strings.getOffset(S, LocalOffset - PrevOffset);
+    uint64_t NewOffset = Strings.getOffset(S, LocalOffset - PrevOffset);
+    OffsetRemapping[PrevOffset] = NewOffset;
+    if (NewOffset > UINT32_MAX)
+      NewOffsetSize = 8;
     PrevOffset = LocalOffset;
   }
 
@@ -451,7 +461,7 @@ void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
   uint64_t Size = CurStrOffsetSection.size();
   if (Version > 4) {
     while (Offset < Size) {
-      uint64_t HeaderSize = debugStrOffsetsHeaderSize(Data, Version);
+      const uint64_t HeaderSize = debugStrOffsetsHeaderSize(Data, Version);
       assert(HeaderSize <= Size - Offset &&
              "StrOffsetSection size is less than its header");
 
@@ -461,16 +471,52 @@ void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
       if (HeaderSize == 8) {
         ContributionSize = Data.getU32(&HeaderLengthOffset);
       } else if (HeaderSize == 16) {
+        OldOffsetSize = 8;
         HeaderLengthOffset += 4; // skip the dwarf64 marker
         ContributionSize = Data.getU64(&HeaderLengthOffset);
       }
       ContributionEnd = ContributionSize + HeaderLengthOffset;
-      Out.emitBytes(Data.getBytes(&Offset, HeaderSize));
-      writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, ContributionEnd);
+
+      StringRef HeaderBytes = Data.getBytes(&Offset, HeaderSize);
+      if (OldOffsetSize == 4 && NewOffsetSize == 8) {
+        // We had a DWARF32 .debug_str_offsets header, but we need to emit
+        // some string offsets that require 64 bit offsets on the .debug_str
+        // section. Emit the .debug_str_offsets header in DWARF64 format so we
+        // can emit string offsets that exceed UINT32_MAX without truncating
+        // the string offset.
+
+        // 2 bytes for DWARF version, 2 bytes pad.
+        const uint64_t VersionPadSize = 4;
+        const uint64_t NewLength =
+            (ContributionSize - VersionPadSize) * 2 + VersionPadSize;
+        // Emit the DWARF64 length that starts with a 4 byte DW_LENGTH_DWARF64
+        // value followed by the 8 byte updated length.
+        Out.emitIntValue(llvm::dwarf::DW_LENGTH_DWARF64, 4);
+        Out.emitIntValue(NewLength, 8);
+        // Emit DWARF version as a 2 byte integer.
+        Out.emitIntValue(Version, 2);
+        // Emit 2 bytes of padding.
+        Out.emitIntValue(0, 2);
+        // Update the .debug_str_offsets section length contribution for the
+        // this .dwo file.
+        for (auto &Pair : SectionLength) {
+          if (Pair.first == DW_SECT_STR_OFFSETS) {
+            Pair.second = NewLength + 12;
+            break;
+          }
+        }
+      } else {
+        // Just emit the same .debug_str_offsets header.
+        Out.emitBytes(HeaderBytes);
+      }
+      writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, ContributionEnd,
+                        OldOffsetSize, NewOffsetSize);
     }
 
   } else {
-    writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, Size);
+    assert(OldOffsetSize == NewOffsetSize);
+    writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, Size, OldOffsetSize,
+                      NewOffsetSize);
   }
 }
 
@@ -562,7 +608,7 @@ Error handleSection(
     std::vector<StringRef> &CurTypesSection,
     std::vector<StringRef> &CurInfoSection, StringRef &AbbrevSection,
     StringRef &CurCUIndexSection, StringRef &CurTUIndexSection,
-    std::vector<std::pair<DWARFSectionKind, uint32_t>> &SectionLength) {
+    SectionLengths &SectionLength) {
   if (Section.isBSS())
     return Error::success();
 
@@ -684,7 +730,7 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
     // This maps each section contained in this file to its length.
     // This information is later on used to calculate the contributions,
     // i.e. offset and length, of each compile/type unit to a section.
-    std::vector<std::pair<DWARFSectionKind, uint32_t>> SectionLength;
+    SectionLengths SectionLength;
 
     for (const auto &Section : Obj.sections())
       if (auto Err = handleSection(
@@ -713,7 +759,7 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
     }
 
     writeStringsAndOffsets(Out, Strings, StrOffsetSection, CurStrSection,
-                           CurStrOffsetSection, Header.Version);
+                           CurStrOffsetSection, Header.Version, SectionLength);
 
     for (auto Pair : SectionLength) {
       auto Index = getContributionIndex(Pair.first, IndexVersion);
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
index da0bf03e1ac57..b4256ae13914c 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
@@ -1187,9 +1187,18 @@ DWARFUnit::determineStringOffsetsTableContributionDWO(DWARFDataExtractor &DA) {
   if (getVersion() >= 5) {
     if (DA.getData().data() == nullptr)
       return std::nullopt;
-    Offset += Header.getFormat() == dwarf::DwarfFormat::DWARF32 ? 8 : 16;
+    // For .dwo files, the section contribution for the .debug_str_offsets
+    // points to the string offsets table header. Decode the format from this
+    // data as llvm-dwp has been modified to be able to emit a
+    // .debug_str_offsets table as DWARF64 even if the compile unit is DWARF32.
+    // This allows .dwp files to have string tables that exceed UINT32_MAX in
+    // size.
+    uint64_t Length = 0;
+    DwarfFormat Format = dwarf::DwarfFormat::DWARF32;
+    std::tie(Length, Format) = DA.getInitialLength(&Offset);
+    Offset += 4; // Skip the DWARF version uint16_t and the uint16_t padding.
     // Look for a valid contribution at the given offset.
-    auto DescOrError = parseDWARFStringOffsetsTableHeader(DA, Header.getFormat(), Offset);
+    auto DescOrError = parseDWARFStringOffsetsTableHeader(DA, Format, Offset);
     if (!DescOrError)
       return DescOrError.takeError();
     return *DescOrError;

llvmbot · 2025-11-11T05:29:51Z

@llvm/pr-subscribers-lldb

Author: Greg Clayton (clayborg)

Changes

We can change llvm-dwp to emit DWARF64 version of the .debug_str_offsets tables for .dwo files in a .dwp file. This allows the string table to exceed 4GB without truncating string offsets into the .debug_str section and losing data. llvm-dwp will append all strings to the .debug_str section for a .dwo file, and if any of the new string offsets exceed UINT32_MAX, it will upgrade the .debug_str_offsets table to a DWARF64 header and then each string offset in that table can now have a 64 bit offset.

Fixed LLDB to be able to successfully load the 64 bit string tables in .dwp files.

Fixed llvm-dwarfdump and LLVM DWARF parsing code to do the right thing with DWARF64 string table headers.

Full diff: https://github.com/llvm/llvm-project/pull/167457.diff

6 Files Affected:

(modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp (+9-4)
(modified) lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h (+1-1)
(modified) llvm/include/llvm/DWP/DWP.h (+3-1)
(modified) llvm/include/llvm/DWP/DWPStringPool.h (+3-3)
(modified) llvm/lib/DWP/DWP.cpp (+62-16)
(modified) llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp (+11-2)

diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
index 94fc2e83e899d..7b7864caf8c09 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.cpp
@@ -360,8 +360,10 @@ void DWARFUnit::SetDwoStrOffsetsBase() {
     const DWARFDataExtractor &strOffsets =
         GetSymbolFileDWARF().GetDWARFContext().getOrLoadStrOffsetsData();
     uint64_t length = strOffsets.GetU32(&baseOffset);
-    if (length == 0xffffffff)
+    if (length == 0xffffffff) {
       length = strOffsets.GetU64(&baseOffset);
+      m_str_offsets_size = 8;
+    }
 
     // Check version.
     if (strOffsets.GetU16(&baseOffset) < 5)
@@ -369,6 +371,10 @@ void DWARFUnit::SetDwoStrOffsetsBase() {
 
     // Skip padding.
     baseOffset += 2;
+  } else {
+    // Size of offset for .debug_str_offsets is same as DWARF offset byte size
+    // of the DWARFUnit for DWARF version 4 and earlier.
+    m_str_offsets_size = m_header.getDwarfOffsetByteSize();
   }
 
   SetStrOffsetsBase(baseOffset);
@@ -1079,10 +1085,9 @@ uint32_t DWARFUnit::GetHeaderByteSize() const { return m_header.getSize(); }
 
 std::optional<uint64_t>
 DWARFUnit::GetStringOffsetSectionItem(uint32_t index) const {
-  lldb::offset_t offset =
-      GetStrOffsetsBase() + index * m_header.getDwarfOffsetByteSize();
+  lldb::offset_t offset = GetStrOffsetsBase() + index * m_str_offsets_size;
   return m_dwarf.GetDWARFContext().getOrLoadStrOffsetsData().GetMaxU64(
-      &offset, m_header.getDwarfOffsetByteSize());
+      &offset, m_str_offsets_size);
 }
 
 llvm::Expected<llvm::DWARFAddressRangesVector>
diff --git a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
index 91a693860c55a..856db5e4101cd 100644
--- a/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
+++ b/lldb/source/Plugins/SymbolFile/DWARF/DWARFUnit.h
@@ -364,7 +364,7 @@ class DWARFUnit : public DWARFExpression::Delegate, public UserID {
   dw_offset_t m_line_table_offset = DW_INVALID_OFFSET;
 
   dw_offset_t m_str_offsets_base = 0; // Value of DW_AT_str_offsets_base.
-
+  dw_offset_t m_str_offsets_size = 4; // Size in bytes of the string offsets.
   std::optional<llvm::DWARFDebugRnglistTable> m_rnglist_table;
   bool m_rnglist_table_done = false;
   std::optional<llvm::DWARFListTableHeader> m_loclist_table_header;
diff --git a/llvm/include/llvm/DWP/DWP.h b/llvm/include/llvm/DWP/DWP.h
index a759bae10d160..cc38369658eaa 100644
--- a/llvm/include/llvm/DWP/DWP.h
+++ b/llvm/include/llvm/DWP/DWP.h
@@ -70,6 +70,8 @@ struct CompileUnitIdentifiers {
 LLVM_ABI Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
                      OnCuIndexOverflow OverflowOptValue);
 
+typedef std::vector<std::pair<DWARFSectionKind, uint32_t>> SectionLengths;
+
 LLVM_ABI Error handleSection(
     const StringMap<std::pair<MCSection *, DWARFSectionKind>> &KnownSections,
     const MCSection *StrSection, const MCSection *StrOffsetSection,
@@ -82,7 +84,7 @@ LLVM_ABI Error handleSection(
     std::vector<StringRef> &CurTypesSection,
     std::vector<StringRef> &CurInfoSection, StringRef &AbbrevSection,
     StringRef &CurCUIndexSection, StringRef &CurTUIndexSection,
-    std::vector<std::pair<DWARFSectionKind, uint32_t>> &SectionLength);
+    SectionLengths &SectionLength);
 
 LLVM_ABI Expected<InfoSectionUnitHeader>
 parseInfoSectionUnitHeader(StringRef Info);
diff --git a/llvm/include/llvm/DWP/DWPStringPool.h b/llvm/include/llvm/DWP/DWPStringPool.h
index 1354b46f156b6..d1486ff7872e1 100644
--- a/llvm/include/llvm/DWP/DWPStringPool.h
+++ b/llvm/include/llvm/DWP/DWPStringPool.h
@@ -32,13 +32,13 @@ class DWPStringPool {
 
   MCStreamer &Out;
   MCSection *Sec;
-  DenseMap<const char *, uint32_t, CStrDenseMapInfo> Pool;
-  uint32_t Offset = 0;
+  DenseMap<const char *, uint64_t, CStrDenseMapInfo> Pool;
+  uint64_t Offset = 0;
 
 public:
   DWPStringPool(MCStreamer &Out, MCSection *Sec) : Out(Out), Sec(Sec) {}
 
-  uint32_t getOffset(const char *Str, unsigned Length) {
+  uint64_t getOffset(const char *Str, unsigned Length) {
     assert(strlen(Str) + 1 == Length && "Ensure length hint is correct");
 
     auto Pair = Pool.insert(std::make_pair(Str, Offset));
diff --git a/llvm/lib/DWP/DWP.cpp b/llvm/lib/DWP/DWP.cpp
index b565edbfe96db..54edce81208b5 100644
--- a/llvm/lib/DWP/DWP.cpp
+++ b/llvm/lib/DWP/DWP.cpp
@@ -413,33 +413,43 @@ Expected<InfoSectionUnitHeader> parseInfoSectionUnitHeader(StringRef Info) {
 }
 
 static void writeNewOffsetsTo(MCStreamer &Out, DataExtractor &Data,
-                              DenseMap<uint64_t, uint32_t> &OffsetRemapping,
-                              uint64_t &Offset, uint64_t &Size) {
+                              DenseMap<uint64_t, uint64_t> &OffsetRemapping,
+                              uint64_t &Offset, const uint64_t Size,
+                              uint32_t OldOffsetSize, uint32_t NewOffsetSize) {
 
   while (Offset < Size) {
-    auto OldOffset = Data.getU32(&Offset);
-    auto NewOffset = OffsetRemapping[OldOffset];
-    Out.emitIntValue(NewOffset, 4);
+    const uint64_t OldOffset = Data.getUnsigned(&Offset, OldOffsetSize);
+    const uint64_t NewOffset = OffsetRemapping[OldOffset];
+    assert(NewOffsetSize == 8 || NewOffset <= UINT32_MAX);
+    Out.emitIntValue(NewOffset, NewOffsetSize);
   }
 }
 
 void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
                             MCSection *StrOffsetSection,
                             StringRef CurStrSection,
-                            StringRef CurStrOffsetSection, uint16_t Version) {
+                            StringRef CurStrOffsetSection, uint16_t Version,
+                            SectionLengths &SectionLength) {
   // Could possibly produce an error or warning if one of these was non-null but
   // the other was null.
   if (CurStrSection.empty() || CurStrOffsetSection.empty())
     return;
 
-  DenseMap<uint64_t, uint32_t> OffsetRemapping;
+  DenseMap<uint64_t, uint64_t> OffsetRemapping;
 
   DataExtractor Data(CurStrSection, true, 0);
   uint64_t LocalOffset = 0;
   uint64_t PrevOffset = 0;
+
+  // Keep track if any new string offsets exceed UINT32_MAX. If any do, we can
+  // emit a DWARF64 .debug_str_offsets table for this compile unit.
+  uint32_t OldOffsetSize = 4;
+  uint32_t NewOffsetSize = 4;
   while (const char *S = Data.getCStr(&LocalOffset)) {
-    OffsetRemapping[PrevOffset] =
-        Strings.getOffset(S, LocalOffset - PrevOffset);
+    uint64_t NewOffset = Strings.getOffset(S, LocalOffset - PrevOffset);
+    OffsetRemapping[PrevOffset] = NewOffset;
+    if (NewOffset > UINT32_MAX)
+      NewOffsetSize = 8;
     PrevOffset = LocalOffset;
   }
 
@@ -451,7 +461,7 @@ void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
   uint64_t Size = CurStrOffsetSection.size();
   if (Version > 4) {
     while (Offset < Size) {
-      uint64_t HeaderSize = debugStrOffsetsHeaderSize(Data, Version);
+      const uint64_t HeaderSize = debugStrOffsetsHeaderSize(Data, Version);
       assert(HeaderSize <= Size - Offset &&
              "StrOffsetSection size is less than its header");
 
@@ -461,16 +471,52 @@ void writeStringsAndOffsets(MCStreamer &Out, DWPStringPool &Strings,
       if (HeaderSize == 8) {
         ContributionSize = Data.getU32(&HeaderLengthOffset);
       } else if (HeaderSize == 16) {
+        OldOffsetSize = 8;
         HeaderLengthOffset += 4; // skip the dwarf64 marker
         ContributionSize = Data.getU64(&HeaderLengthOffset);
       }
       ContributionEnd = ContributionSize + HeaderLengthOffset;
-      Out.emitBytes(Data.getBytes(&Offset, HeaderSize));
-      writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, ContributionEnd);
+
+      StringRef HeaderBytes = Data.getBytes(&Offset, HeaderSize);
+      if (OldOffsetSize == 4 && NewOffsetSize == 8) {
+        // We had a DWARF32 .debug_str_offsets header, but we need to emit
+        // some string offsets that require 64 bit offsets on the .debug_str
+        // section. Emit the .debug_str_offsets header in DWARF64 format so we
+        // can emit string offsets that exceed UINT32_MAX without truncating
+        // the string offset.
+
+        // 2 bytes for DWARF version, 2 bytes pad.
+        const uint64_t VersionPadSize = 4;
+        const uint64_t NewLength =
+            (ContributionSize - VersionPadSize) * 2 + VersionPadSize;
+        // Emit the DWARF64 length that starts with a 4 byte DW_LENGTH_DWARF64
+        // value followed by the 8 byte updated length.
+        Out.emitIntValue(llvm::dwarf::DW_LENGTH_DWARF64, 4);
+        Out.emitIntValue(NewLength, 8);
+        // Emit DWARF version as a 2 byte integer.
+        Out.emitIntValue(Version, 2);
+        // Emit 2 bytes of padding.
+        Out.emitIntValue(0, 2);
+        // Update the .debug_str_offsets section length contribution for the
+        // this .dwo file.
+        for (auto &Pair : SectionLength) {
+          if (Pair.first == DW_SECT_STR_OFFSETS) {
+            Pair.second = NewLength + 12;
+            break;
+          }
+        }
+      } else {
+        // Just emit the same .debug_str_offsets header.
+        Out.emitBytes(HeaderBytes);
+      }
+      writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, ContributionEnd,
+                        OldOffsetSize, NewOffsetSize);
     }
 
   } else {
-    writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, Size);
+    assert(OldOffsetSize == NewOffsetSize);
+    writeNewOffsetsTo(Out, Data, OffsetRemapping, Offset, Size, OldOffsetSize,
+                      NewOffsetSize);
   }
 }
 
@@ -562,7 +608,7 @@ Error handleSection(
     std::vector<StringRef> &CurTypesSection,
     std::vector<StringRef> &CurInfoSection, StringRef &AbbrevSection,
     StringRef &CurCUIndexSection, StringRef &CurTUIndexSection,
-    std::vector<std::pair<DWARFSectionKind, uint32_t>> &SectionLength) {
+    SectionLengths &SectionLength) {
   if (Section.isBSS())
     return Error::success();
 
@@ -684,7 +730,7 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
     // This maps each section contained in this file to its length.
     // This information is later on used to calculate the contributions,
     // i.e. offset and length, of each compile/type unit to a section.
-    std::vector<std::pair<DWARFSectionKind, uint32_t>> SectionLength;
+    SectionLengths SectionLength;
 
     for (const auto &Section : Obj.sections())
       if (auto Err = handleSection(
@@ -713,7 +759,7 @@ Error write(MCStreamer &Out, ArrayRef<std::string> Inputs,
     }
 
     writeStringsAndOffsets(Out, Strings, StrOffsetSection, CurStrSection,
-                           CurStrOffsetSection, Header.Version);
+                           CurStrOffsetSection, Header.Version, SectionLength);
 
     for (auto Pair : SectionLength) {
       auto Index = getContributionIndex(Pair.first, IndexVersion);
diff --git a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
index da0bf03e1ac57..b4256ae13914c 100644
--- a/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
+++ b/llvm/lib/DebugInfo/DWARF/DWARFUnit.cpp
@@ -1187,9 +1187,18 @@ DWARFUnit::determineStringOffsetsTableContributionDWO(DWARFDataExtractor &DA) {
   if (getVersion() >= 5) {
     if (DA.getData().data() == nullptr)
       return std::nullopt;
-    Offset += Header.getFormat() == dwarf::DwarfFormat::DWARF32 ? 8 : 16;
+    // For .dwo files, the section contribution for the .debug_str_offsets
+    // points to the string offsets table header. Decode the format from this
+    // data as llvm-dwp has been modified to be able to emit a
+    // .debug_str_offsets table as DWARF64 even if the compile unit is DWARF32.
+    // This allows .dwp files to have string tables that exceed UINT32_MAX in
+    // size.
+    uint64_t Length = 0;
+    DwarfFormat Format = dwarf::DwarfFormat::DWARF32;
+    std::tie(Length, Format) = DA.getInitialLength(&Offset);
+    Offset += 4; // Skip the DWARF version uint16_t and the uint16_t padding.
     // Look for a valid contribution at the given offset.
-    auto DescOrError = parseDWARFStringOffsetsTableHeader(DA, Header.getFormat(), Offset);
+    auto DescOrError = parseDWARFStringOffsetsTableHeader(DA, Format, Offset);
     if (!DescOrError)
       return DescOrError.takeError();
     return *DescOrError;

dwblaikie · 2025-11-11T18:52:01Z

Yep, sounds about right to me as an implementation of the suggestion that I think @probinson first made when we discussed this upstream (& when I went down the Simplified Template Names direction) - this only uses DWARF64 .debug_str_offsets for the specific contrtibutions that need it (so should have (worth testing, etc) no impact/bit identical output for any DWP that's already correct).

The patches will need to be separated - ideally consumers are fixed before producers (but some producers don't have good ways of testing without the producer - in which case there might be a small window of breakage (well, breakage where an existing user would've got an error on dwp output, doesn't get an error but then some tool might not be able to consume it)).

Probably needs a flag for dwp (I haven't checked whether the patch proposes one) since not all consumers would be ready for this behavior (ie: gdb, presumably).

Implementation-wise, I think there's a few utilities we have for handling the DWARF32/64 length parsing and emission that could be used in places.

… DWARF units in .dwp files. This path is updating the reading capabilities of the LLVM DWARF parser for a llvm-dwp patch llvm#167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

clayborg · 2025-11-14T00:39:25Z

The patches will need to be separated - ideally consumers are fixed before producers (but some producers don't have good ways of testing without the producer - in which case there might be a small window of breakage (well, breakage where an existing user would've got an error on dwp output, doesn't get an error but then some tool might not be able to consume it)).

I separated the first LLVM parsing stuff with a test here:
#167986

…DWARF32 DWARF units in .dwp files in LLDB. This patch is updating the reading capabilities of the LLDB DWARF parser for a llvm-dwp patch llvm#167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

clayborg · 2025-11-14T01:54:37Z

Here is the separate patch for the LLDB DWARF parser with a test:

#167997

… DWARF units in .dwp files. (#167986) This path is updating the reading capabilities of the LLVM DWARF parser for a llvm-dwp patch #167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

…for DWARF32 DWARF units in .dwp files. (#167986) This path is updating the reading capabilities of the LLVM DWARF parser for a llvm-dwp patch llvm/llvm-project#167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

…DWARF32 DWARF units in .dwp files in LLDB. This patch is updating the reading capabilities of the LLDB DWARF parser for a llvm-dwp patch llvm#167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

…DWARF32 DWARF units in .dwp files in LLDB. (#167997) This patch is updating the reading capabilities of the LLDB DWARF parser for a llvm-dwp patch #167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

…ing data. We can change llvm-dwp to emit DWARF64 version of the .debug_str_offsets tables for .dwo files in a .dwp file. This allows the string table to exceed 4GB without truncating string offsets into the .debug_str section and losing data. llvm-dwp will append all strings to the .debug_str section for a .dwo file, and if any of the new string offsets exceed UINT32_MAX, it will upgrade the .debug_str_offsets table to a DWARF64 header and then each string offset in that table can now have a 64 bit offset.

clayborg · 2025-11-15T00:42:52Z

I merged with upstream to get the changes from the two PRs that add DWARF64 .debug_str_offsets support for .dwo files in LLDB and LLVM.

clayborg · 2025-11-15T00:45:25Z

@dwblaikie do we need a flag in llvm-dwp to enable this? I would almost rather have GDB show an error when retrieving a string whose .debug_str_offsets table has been promoted to a DWARF64 table than show a truncated and invalid string because a 64 bit string offset was truncated to 32 bits. Let me know your thoughts.

…tables for DWARF32 DWARF units in .dwp files in LLDB. (#167997) This patch is updating the reading capabilities of the LLDB DWARF parser for a llvm-dwp patch llvm/llvm-project#167457 that will emit .dwp files where the compile units are DWARF32 and the .debug_str_offsets tables will be emitted as DWARF64 to allow .debug_str sections that exceed 4GB in size.

grigorypas · 2025-11-15T01:27:51Z

@dwblaikie do we need a flag in llvm-dwp to enable this? I would almost rather have GDB show an error when retrieving a string whose .debug_str_offsets table has been promoted to a DWARF64 table than show a truncated and invalid string because a 64 bit string offset was truncated to 32 bits. Let me know your thoughts.

I think adding a feature flag would be valuable here. It would allow users to gradually roll out and test this change in production with easy rollback if needed. Additionally, other tools beyond debuggers might assume everything is encoded with DWARF32, and a flag gives users the flexibility to monitor for unexpected breakages and maintain backwards compatibility during the transition.

clayborg · 2025-11-15T04:42:39Z

I was thinking I could also add an option, possibly keeping it hidden by default, so that in testing I can force promotion from DWARF32 to DWARF64 so we can test that it works.

Looking at the way this was coded, it seems like llvm-dwp would have missed the fact that the .debug_str went over 4GB limit when writing the strings and would have truncated the new offset. There is some code that uses a variable for OverflowOptValue to try and catch issues or ignore them, but everything I see seems like it is 32 bit offsets so I am not sure how it would catch this.

I will add an option so I can test without creating a huge .dwp file and I will report back on what I find about how LLDB and GDB deal with such input, and if they silently fail, then I will add an option to enable this.

clayborg requested review from WenleiHe, dwblaikie and jeffreytan81 November 11, 2025 05:29

clayborg requested a review from JDevlieghere as a code owner November 11, 2025 05:29

llvmbot added lldb debuginfo labels Nov 11, 2025

WenleiHe requested a review from grigorypas November 13, 2025 07:01

clayborg mentioned this pull request Nov 14, 2025

Add the ability to load DWARF64 .debug_str_offsets tables for DWARF32 DWARF units in .dwp files. #167986

Merged

clayborg mentioned this pull request Nov 14, 2025

[lldb] Add the ability to load DWARF64 .debug_str_offsets tables for DWARF32 DWARF units in .dwp files in LLDB. #167997

Merged

clayborg added 2 commits November 14, 2025 16:38

Merge with upstream and remove extra code that isn't needed.

98b0ee5

clayborg force-pushed the llvm-dwp-str-offsets-64 branch from a243106 to 98b0ee5 Compare November 15, 2025 00:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Modify llvm-dwp to be able to emit string tables over 4GB without losing data #167457

Modify llvm-dwp to be able to emit string tables over 4GB without losing data #167457

Uh oh!

clayborg commented Nov 11, 2025

Uh oh!

clayborg commented Nov 11, 2025

Uh oh!

llvmbot commented Nov 11, 2025

Uh oh!

llvmbot commented Nov 11, 2025

Uh oh!

dwblaikie commented Nov 11, 2025

Uh oh!

clayborg commented Nov 14, 2025 •

edited

Loading

Uh oh!

clayborg commented Nov 14, 2025

Uh oh!

clayborg commented Nov 15, 2025

Uh oh!

clayborg commented Nov 15, 2025 •

edited

Loading

Uh oh!

grigorypas commented Nov 15, 2025

Uh oh!

clayborg commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Modify llvm-dwp to be able to emit string tables over 4GB without losing data #167457

Are you sure you want to change the base?

Modify llvm-dwp to be able to emit string tables over 4GB without losing data #167457

Uh oh!

Conversation

clayborg commented Nov 11, 2025

Uh oh!

clayborg commented Nov 11, 2025

Uh oh!

llvmbot commented Nov 11, 2025

Uh oh!

llvmbot commented Nov 11, 2025

Uh oh!

dwblaikie commented Nov 11, 2025

Uh oh!

clayborg commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

clayborg commented Nov 14, 2025

Uh oh!

clayborg commented Nov 15, 2025

Uh oh!

clayborg commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

grigorypas commented Nov 15, 2025

Uh oh!

clayborg commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

clayborg commented Nov 14, 2025 •

edited

Loading

clayborg commented Nov 15, 2025 •

edited

Loading