Skip to content

Conversation

@itrofimow
Copy link
Contributor

After we sort the CUVector, we have to update CU-indices in address map and constant pool

@llvmbot
Copy link
Member

llvmbot commented Aug 4, 2025

@llvm/pr-subscribers-bolt

Author: None (itrofimow)

Changes

After we sort the CUVector, we have to update CU-indices in address map and constant pool


Full diff: https://github.com/llvm/llvm-project/pull/151927.diff

1 Files Affected:

  • (modified) bolt/lib/Core/GDBIndex.cpp (+73-2)
diff --git a/bolt/lib/Core/GDBIndex.cpp b/bolt/lib/Core/GDBIndex.cpp
index c7fb4889646b4..0fe1c5de94138 100644
--- a/bolt/lib/Core/GDBIndex.cpp
+++ b/bolt/lib/Core/GDBIndex.cpp
@@ -130,6 +130,26 @@ void GDBIndex::updateGdbIndexSection(
             [](const MapEntry &E1, const MapEntry &E2) -> bool {
               return E1.second.Offset < E2.second.Offset;
             });
+  // Create the original CU index -> updated CU index mapping,
+  // as the sort above could've changed the order and we have to update
+  // indexes correspondingly in address map and constant pool.
+  std::unordered_map<uint32_t, uint32_t> OriginalCUIndexToUpdatedCUIndexMap;
+  OriginalCUIndexToUpdatedCUIndexMap.reserve(CUVector.size());
+  for (uint32_t I = 0; I < CUVector.size(); ++I) {
+    OriginalCUIndexToUpdatedCUIndexMap[OffsetToIndexMap.at(CUVector[I].first)] =
+        I;
+  }
+  const auto RemapCUIndex =
+      [&OriginalCUIndexToUpdatedCUIndexMap](uint32_t OriginalIndex) {
+        const auto it = OriginalCUIndexToUpdatedCUIndexMap.find(OriginalIndex);
+        if (it == OriginalCUIndexToUpdatedCUIndexMap.end()) {
+          errs() << "BOLT-ERROR: .gdb_index unknown CU index\n";
+          exit(1);
+        }
+
+        return it->second;
+      };
+
   // Writing out CU List <Offset, Size>
   for (auto &CUInfo : CUVector) {
     // Skipping TU for DWARF5 when they are not included in CU list.
@@ -160,12 +180,13 @@ void GDBIndex::updateGdbIndexSection(
   // Generate new address table.
   for (const std::pair<const uint64_t, DebugAddressRangesVector> &CURangesPair :
        ARangesSectionWriter.getCUAddressRanges()) {
-    const uint32_t CUIndex = OffsetToIndexMap[CURangesPair.first];
+    const uint32_t OriginalCUIndex = OffsetToIndexMap[CURangesPair.first];
+    const uint32_t UpdatedCUIndex = RemapCUIndex(OriginalCUIndex);
     const DebugAddressRangesVector &Ranges = CURangesPair.second;
     for (const DebugAddressRange &Range : Ranges) {
       write64le(Buffer, Range.LowPC);
       write64le(Buffer + 8, Range.HighPC);
-      write32le(Buffer + 16, CUIndex);
+      write32le(Buffer + 16, UpdatedCUIndex);
       Buffer += 20;
     }
   }
@@ -178,6 +199,56 @@ void GDBIndex::updateGdbIndexSection(
   // Copy over the rest of the original data.
   memcpy(Buffer, Data, TrailingSize);
 
+  // Fixup CU-indicies in constant pool.
+  const char *const OriginalConstantPoolData =
+      GdbIndexContents.data() + ConstantPoolOffset;
+  uint8_t *const UpdatedConstantPoolData =
+      NewGdbIndexContents + ConstantPoolOffset + Delta;
+
+  const char *OriginalSymbolTableData =
+      GdbIndexContents.data() + SymbolTableOffset;
+  std::set<uint32_t> CUVectorOffsets;
+  // Parse the symbol map and extract constant pool CU offsets from it.
+  while (OriginalSymbolTableData < OriginalConstantPoolData) {
+    const uint32_t NameOffset = read32le(OriginalSymbolTableData);
+    const uint32_t CUVectorOffset = read32le(OriginalSymbolTableData + 4);
+    OriginalSymbolTableData += 8;
+
+    // Iff both are zero, then the slot is considered empty in the hash-map.
+    if (NameOffset || CUVectorOffset) {
+      CUVectorOffsets.insert(CUVectorOffset);
+    }
+  }
+
+  // Update the CU-indicies in the constant pool
+  for (const auto CUVectorOffset : CUVectorOffsets) {
+    const char *CurrentOriginalConstantPoolData =
+        OriginalConstantPoolData + CUVectorOffset;
+    uint8_t *CurrentUpdatedConstantPoolData =
+        UpdatedConstantPoolData + CUVectorOffset;
+
+    const uint32_t Num = read32le(CurrentOriginalConstantPoolData);
+    CurrentOriginalConstantPoolData += 4;
+    CurrentUpdatedConstantPoolData += 4;
+
+    for (uint32_t J = 0; J < Num; ++J) {
+      const uint32_t OriginalCUIndexAndAttributes =
+          read32le(CurrentOriginalConstantPoolData);
+      CurrentOriginalConstantPoolData += 4;
+
+      // We only care for the index, which is the lowest 24 bits, other bits are
+      // left as is.
+      const uint32_t OriginalCUIndex =
+          OriginalCUIndexAndAttributes & ((1 << 24) - 1);
+      const uint32_t Attributes = OriginalCUIndexAndAttributes >> 24;
+      const uint32_t UpdatedCUIndexAndAttributes =
+          RemapCUIndex(OriginalCUIndex) | (Attributes << 24);
+
+      write32le(CurrentUpdatedConstantPoolData, UpdatedCUIndexAndAttributes);
+      CurrentUpdatedConstantPoolData += 4;
+    }
+  }
+
   // Register the new section.
   BC.registerOrUpdateNoteSection(".gdb_index", NewGdbIndexContents,
                                  NewGdbIndexSize);

@itrofimow itrofimow changed the title [BOLT] Fix possibly incorrect CU-indicies in gdb-index [BOLT] Fix incorrect CU-indicies in gdb-index Aug 5, 2025
@itrofimow
Copy link
Contributor Author

Hi @aaupov ! Could you please have a look at this PR?

@itrofimow
Copy link
Contributor Author

Hi @ayermolo !
I see you are listed as a maintainer of DWARF support in BOLT, and although this is not exactly DWARF-related, you are also
author of this change

std::vector<MapEntry> CUVector(CUMap.begin(), CUMap.end());
// Need to sort since we write out all of TUs in .debug_info before CUs.
std::sort(CUVector.begin(), CUVector.end(),
[](const MapEntry &E1, const MapEntry &E2) -> bool {
return E1.second.Offset < E2.second.Offset;
});

(coming from this commit dcfa2ab), which I think introduced the problem, so could you please have a look at this PR?

There's no rush for me to get this landed, but I intend to backport this change into BOLT we use, so I'd appreciate a review

@ayermolo
Copy link
Contributor

Hi, sorry for late reply.
Let me take a look.

@itrofimow
Copy link
Contributor Author

Hi! No worries, thanks

The problem I'm talking about clearly manifests itself in this test, for instance:

# POSTCHECK: CU list offset = 0x18, has 2 entries
# POSTCHECK-NEXT: 0: Offset = 0x80, Length = 0x55
# POSTCHECK-NEXT: 1: Offset = 0xd5, Length = 0x56
# POSTCHECK: Types CU list offset = 0x38, has 0 entries
# POSTCHECK: Address area offset = 0x38, has 2 entries
# POSTCHECK-NEXT: Low/High address = [0x[[#%.4x,ADDR:]],
# POSTCHECK-SAME: 0x[[#ADDR + 0xf]]) (Size: 0xf), CU id = 1
# POSTCHECK-NEXT: Low/High address = [0x[[#%.4x,ADDR1:]],
# POSTCHECK-SAME: 0x[[#ADDR1 + 0xd]]) (Size: 0xd), CU id = 3

where CU id = 3 in line 20 is just out of range

Copy link
Contributor

@ayermolo ayermolo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Sorry it took so long.

@itrofimow
Copy link
Contributor Author

Hi! Thanks
Could you @ayermolo please merge this PR for me? - I don't have write access.

Also, while you are at it, would you be able to give this tiny PR #151923 a look? - It's related fix about BOLT sometimes emitting [0; 0) ranges in address map

@itrofimow
Copy link
Contributor Author

Kindly pinging you @ayermolo about this PR

@itrofimow
Copy link
Contributor Author

Hi @aaupov ! Maybe you could merge this PR for me and give this one #151923 a quick look?

Copy link
Contributor

@maksfb maksfb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@maksfb maksfb merged commit 1685a6a into llvm:main Sep 25, 2025
9 checks passed
mahesh-attarde pushed a commit to mahesh-attarde/llvm-project that referenced this pull request Oct 3, 2025
After we sort the CUVector, we have to update CU-indices in address map
and constant pool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants