Skip to content

Conversation

@JDevlieghere
Copy link
Member

@JDevlieghere JDevlieghere commented Nov 20, 2025

Introduce VirtualDataExtractor, a DataExtractor subclass that enables reading data at virtual addresses by translating them to physical buffer offsets using a lookup table. The lookup table maps virtual address ranges to physical offsets and enforces boundaries to prevent reads from crossing entry limits.

The new class inherits from DataExtractor, overriding GetData and PeekData to provide transparent virtual address translation for most of the DataExtractor methods. The exception are the unchecked methods, that bypass those methods and are overloaded as well.

@github-actions
Copy link

github-actions bot commented Nov 20, 2025

🐧 Linux x64 Test Results

  • 33196 tests passed
  • 495 tests skipped

@JDevlieghere JDevlieghere force-pushed the virtual-data-extractor branch from a225802 to 9512330 Compare November 20, 2025 19:27
@JDevlieghere JDevlieghere changed the title [lldb] Add VirtualDataExtractor abstraction [lldb] Add VirtualDataExtractor for virtual address translation Nov 20, 2025
@JDevlieghere JDevlieghere marked this pull request as ready for review November 20, 2025 19:28
@llvmbot llvmbot added the lldb label Nov 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 20, 2025

@llvm/pr-subscribers-lldb

Author: Jonas Devlieghere (JDevlieghere)

Changes

Introduce VirtualDataExtractor, a DataExtractor subclass that enables reading data at virtual addresses by translating them to physical buffer offsets using a lookup table. The lookup table maps virtual address ranges to physical offsets and enforces boundaries to prevent reads from crossing entry limits.

The new class inherits from DataExtractor, overriding GetData and PeekData to provide transparent virtual address translation for most of the DataExtractor methods. The exception are the unchecked methods, that bypass those methods and are overloaded as well.


Patch is 39.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/168802.diff

6 Files Affected:

  • (modified) lldb/include/lldb/Utility/DataExtractor.h (+4-2)
  • (added) lldb/include/lldb/Utility/VirtualDataExtractor.h (+82)
  • (modified) lldb/source/Utility/CMakeLists.txt (+1)
  • (added) lldb/source/Utility/VirtualDataExtractor.cpp (+164)
  • (modified) lldb/unittests/Utility/CMakeLists.txt (+1)
  • (added) lldb/unittests/Utility/VirtualDataExtractorTest.cpp (+708)
diff --git a/lldb/include/lldb/Utility/DataExtractor.h b/lldb/include/lldb/Utility/DataExtractor.h
index b4960f5e87c85..fe217795ff3b1 100644
--- a/lldb/include/lldb/Utility/DataExtractor.h
+++ b/lldb/include/lldb/Utility/DataExtractor.h
@@ -334,7 +334,8 @@ class DataExtractor {
   /// \return
   ///     A pointer to the bytes in this object's data if the offset
   ///     and length are valid, or nullptr otherwise.
-  const void *GetData(lldb::offset_t *offset_ptr, lldb::offset_t length) const {
+  virtual const void *GetData(lldb::offset_t *offset_ptr,
+                              lldb::offset_t length) const {
     const uint8_t *ptr = PeekData(*offset_ptr, length);
     if (ptr)
       *offset_ptr += length;
@@ -829,7 +830,8 @@ class DataExtractor {
   ///     A non-nullptr data pointer if \a offset is a valid offset and
   ///     there are \a length bytes available at that offset, nullptr
   ///     otherwise.
-  const uint8_t *PeekData(lldb::offset_t offset, lldb::offset_t length) const {
+  virtual const uint8_t *PeekData(lldb::offset_t offset,
+                                  lldb::offset_t length) const {
     if (ValidOffsetForDataOfSize(offset, length))
       return m_start + offset;
     return nullptr;
diff --git a/lldb/include/lldb/Utility/VirtualDataExtractor.h b/lldb/include/lldb/Utility/VirtualDataExtractor.h
new file mode 100644
index 0000000000000..a57d83dde21be
--- /dev/null
+++ b/lldb/include/lldb/Utility/VirtualDataExtractor.h
@@ -0,0 +1,82 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLDB_UTILITY_VIRTUALDATAEXTRACTOR_H
+#define LLDB_UTILITY_VIRTUALDATAEXTRACTOR_H
+
+#include "lldb/Utility/DataExtractor.h"
+#include "lldb/Utility/RangeMap.h"
+#include "lldb/lldb-types.h"
+
+namespace lldb_private {
+
+/// A DataExtractor subclass that allows reading data at virtual addresses
+/// using a lookup table that maps virtual address ranges to physical offsets.
+///
+/// This class maintains a lookup table where each entry contains:
+/// - base: starting virtual address for this entry
+/// - size: size of this entry in bytes
+/// - data: physical offset in the underlying data buffer
+///
+/// Reads are translated from virtual addresses to physical offsets using
+/// this lookup table. Reads cannot cross entry boundaries and this is
+/// enforced with assertions.
+class VirtualDataExtractor : public DataExtractor {
+public:
+  /// Type alias for the range map used internally.
+  /// Maps virtual addresses (base) to physical offsets (data).
+  using LookupTable =
+      RangeDataVector<lldb::offset_t, lldb::offset_t, lldb::offset_t>;
+
+  VirtualDataExtractor() = default;
+
+  VirtualDataExtractor(const void *data, lldb::offset_t data_length,
+                       lldb::ByteOrder byte_order, uint32_t addr_size,
+                       LookupTable lookup_table);
+
+  VirtualDataExtractor(const lldb::DataBufferSP &data_sp,
+                       lldb::ByteOrder byte_order, uint32_t addr_size,
+                       LookupTable lookup_table);
+
+  const void *GetData(lldb::offset_t *offset_ptr,
+                      lldb::offset_t length) const override;
+
+  const uint8_t *PeekData(lldb::offset_t offset,
+                          lldb::offset_t length) const override;
+
+  uint8_t GetU8_unchecked(lldb::offset_t *offset_ptr) const;
+
+  uint16_t GetU16_unchecked(lldb::offset_t *offset_ptr) const;
+
+  uint32_t GetU32_unchecked(lldb::offset_t *offset_ptr) const;
+
+  uint64_t GetU64_unchecked(lldb::offset_t *offset_ptr) const;
+
+  uint64_t GetMaxU64_unchecked(lldb::offset_t *offset_ptr,
+                               size_t byte_size) const;
+
+  uint64_t GetAddress_unchecked(lldb::offset_t *offset_ptr) const;
+
+  const LookupTable &GetLookupTable() const { return m_lookup_table; }
+
+protected:
+  /// Find the lookup entry that contains the given virtual address.
+  const LookupTable::Entry *FindEntry(lldb::offset_t virtual_addr) const;
+
+  /// Validate that a read at a virtual address is within bounds and
+  /// does not cross entry boundaries.
+  bool ValidateVirtualRead(lldb::offset_t virtual_addr,
+                           lldb::offset_t length) const;
+
+private:
+  LookupTable m_lookup_table;
+};
+
+} // namespace lldb_private
+
+#endif // LLDB_UTILITY_VIRTUALDATAEXTRACTOR_H
diff --git a/lldb/source/Utility/CMakeLists.txt b/lldb/source/Utility/CMakeLists.txt
index 1dd4d63f7016f..4696ed4690d37 100644
--- a/lldb/source/Utility/CMakeLists.txt
+++ b/lldb/source/Utility/CMakeLists.txt
@@ -78,6 +78,7 @@ add_lldb_library(lldbUtility NO_INTERNAL_DEPENDENCIES
   UserIDResolver.cpp
   VASprintf.cpp
   VMRange.cpp
+  VirtualDataExtractor.cpp
   XcodeSDK.cpp
   ZipFile.cpp
 
diff --git a/lldb/source/Utility/VirtualDataExtractor.cpp b/lldb/source/Utility/VirtualDataExtractor.cpp
new file mode 100644
index 0000000000000..537ba3930a91a
--- /dev/null
+++ b/lldb/source/Utility/VirtualDataExtractor.cpp
@@ -0,0 +1,164 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "lldb/Utility/VirtualDataExtractor.h"
+#include <cassert>
+
+using namespace lldb;
+using namespace lldb_private;
+
+VirtualDataExtractor::VirtualDataExtractor(const void *data,
+                                           offset_t data_length,
+                                           ByteOrder byte_order,
+                                           uint32_t addr_size,
+                                           LookupTable lookup_table)
+    : DataExtractor(data, data_length, byte_order, addr_size),
+      m_lookup_table(std::move(lookup_table)) {
+  m_lookup_table.Sort();
+}
+
+VirtualDataExtractor::VirtualDataExtractor(const DataBufferSP &data_sp,
+                                           ByteOrder byte_order,
+                                           uint32_t addr_size,
+                                           LookupTable lookup_table)
+    : DataExtractor(data_sp, byte_order, addr_size),
+      m_lookup_table(std::move(lookup_table)) {
+  m_lookup_table.Sort();
+}
+
+const VirtualDataExtractor::LookupTable::Entry *
+VirtualDataExtractor::FindEntry(offset_t virtual_addr) const {
+  // Use RangeDataVector's binary search instead of linear search.
+  return m_lookup_table.FindEntryThatContains(virtual_addr);
+}
+
+bool VirtualDataExtractor::ValidateVirtualRead(offset_t virtual_addr,
+                                               offset_t length) const {
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  if (!entry)
+    return false;
+
+  // Assert that the read does not cross entry boundaries.
+  // RangeData.Contains() checks if a range is fully contained.
+  assert(entry->Contains(LookupTable::Range(virtual_addr, length)) &&
+         "Read crosses lookup table entry boundary");
+
+  // Also validate that the physical offset is within the data buffer.
+  // RangeData.data contains the physical offset.
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  return ValidOffsetForDataOfSize(physical_offset, length);
+}
+
+const void *VirtualDataExtractor::GetData(offset_t *offset_ptr,
+                                          offset_t length) const {
+  // Override to treat offset as virtual address.
+  if (!offset_ptr)
+    return nullptr;
+
+  offset_t virtual_addr = *offset_ptr;
+
+  if (!ValidateVirtualRead(virtual_addr, length))
+    return nullptr;
+
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  assert(entry && "ValidateVirtualRead should have found an entry");
+
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  // Use base class PeekData directly to avoid recursion.
+  const void *result = DataExtractor::PeekData(physical_offset, length);
+
+  if (result) {
+    // Advance the virtual offset pointer.
+    *offset_ptr += length;
+  }
+
+  return result;
+}
+
+const uint8_t *VirtualDataExtractor::PeekData(offset_t offset,
+                                              offset_t length) const {
+  // Override to treat offset as virtual address.
+  if (!ValidateVirtualRead(offset, length))
+    return nullptr;
+
+  const LookupTable::Entry *entry = FindEntry(offset);
+  assert(entry && "ValidateVirtualRead should have found an entry");
+
+  offset_t physical_offset = entry->data + (offset - entry->base);
+  // Use the base class PeekData with the physical offset.
+  return DataExtractor::PeekData(physical_offset, length);
+}
+
+uint8_t VirtualDataExtractor::GetU8_unchecked(offset_t *offset_ptr) const {
+  offset_t virtual_addr = *offset_ptr;
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  assert(entry && "Unchecked methods require valid virtual address");
+
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  uint8_t result = DataExtractor::GetU8_unchecked(&physical_offset);
+  *offset_ptr += 1;
+  return result;
+}
+
+uint16_t VirtualDataExtractor::GetU16_unchecked(offset_t *offset_ptr) const {
+  offset_t virtual_addr = *offset_ptr;
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  assert(entry && "Unchecked methods require valid virtual address");
+
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  uint16_t result = DataExtractor::GetU16_unchecked(&physical_offset);
+  *offset_ptr += 2;
+  return result;
+}
+
+uint32_t VirtualDataExtractor::GetU32_unchecked(offset_t *offset_ptr) const {
+  offset_t virtual_addr = *offset_ptr;
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  assert(entry && "Unchecked methods require valid virtual address");
+
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  uint32_t result = DataExtractor::GetU32_unchecked(&physical_offset);
+  *offset_ptr += 4;
+  return result;
+}
+
+uint64_t VirtualDataExtractor::GetU64_unchecked(offset_t *offset_ptr) const {
+  offset_t virtual_addr = *offset_ptr;
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  assert(entry && "Unchecked methods require valid virtual address");
+
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  uint64_t result = DataExtractor::GetU64_unchecked(&physical_offset);
+  *offset_ptr += 8;
+  return result;
+}
+
+uint64_t VirtualDataExtractor::GetMaxU64_unchecked(offset_t *offset_ptr,
+                                                   size_t byte_size) const {
+  offset_t virtual_addr = *offset_ptr;
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  assert(entry && "Unchecked methods require valid virtual address");
+
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  uint64_t result =
+      DataExtractor::GetMaxU64_unchecked(&physical_offset, byte_size);
+  *offset_ptr += byte_size;
+  return result;
+}
+
+uint64_t
+VirtualDataExtractor::GetAddress_unchecked(offset_t *offset_ptr) const {
+  offset_t virtual_addr = *offset_ptr;
+  const LookupTable::Entry *entry = FindEntry(virtual_addr);
+  assert(entry && "Unchecked methods require valid virtual address");
+
+  offset_t physical_offset = entry->data + (virtual_addr - entry->base);
+  uint64_t result = DataExtractor::GetAddress_unchecked(&physical_offset);
+  *offset_ptr += m_addr_size;
+  return result;
+}
diff --git a/lldb/unittests/Utility/CMakeLists.txt b/lldb/unittests/Utility/CMakeLists.txt
index aed4177f5edee..77b52079cf32b 100644
--- a/lldb/unittests/Utility/CMakeLists.txt
+++ b/lldb/unittests/Utility/CMakeLists.txt
@@ -48,6 +48,7 @@ add_lldb_unittest(UtilityTests
   UserIDResolverTest.cpp
   UUIDTest.cpp
   VASprintfTest.cpp
+  VirtualDataExtractorTest.cpp
   VMRangeTest.cpp
   XcodeSDKTest.cpp
 
diff --git a/lldb/unittests/Utility/VirtualDataExtractorTest.cpp b/lldb/unittests/Utility/VirtualDataExtractorTest.cpp
new file mode 100644
index 0000000000000..cb9edbc8950d9
--- /dev/null
+++ b/lldb/unittests/Utility/VirtualDataExtractorTest.cpp
@@ -0,0 +1,708 @@
+//===----------------------------------------------------------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "lldb/Utility/VirtualDataExtractor.h"
+#include "lldb/Utility/DataBufferHeap.h"
+#include "gtest/gtest.h"
+
+using namespace lldb_private;
+using namespace lldb;
+
+TEST(VirtualDataExtractorTest, BasicConstruction) {
+  // Create a simple data buffer.
+  uint8_t buffer[] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
+
+  // Create a lookup table that maps virtual addresses to physical offsets.
+  VirtualDataExtractor::LookupTable lookup_table;
+  // Virtual address 0x1000-0x1008 maps to physical offset 0-8.
+  // Entry(base=virtual_offset, size, data=physical_offset).
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 8, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  EXPECT_EQ(extractor.GetByteSize(), 8U);
+}
+
+TEST(VirtualDataExtractorTest, GetDataAtVirtualOffset) {
+  uint8_t buffer[] = {0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 8, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  offset_t virtual_offset = 0x1000;
+  const void *data = extractor.GetData(&virtual_offset, 4);
+
+  ASSERT_NE(data, nullptr);
+  EXPECT_EQ(virtual_offset, 0x1004U);
+  EXPECT_EQ(memcmp(data, buffer, 4), 0);
+}
+
+TEST(VirtualDataExtractorTest, GetDataAtVirtualOffsetInvalid) {
+  uint8_t buffer[] = {0x01, 0x02, 0x03, 0x04};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 4, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  // Try to read from an invalid virtual address.
+  offset_t virtual_offset = 0x2000;
+  const void *data = extractor.GetData(&virtual_offset, 4);
+
+  EXPECT_EQ(data, nullptr);
+}
+
+TEST(VirtualDataExtractorTest, GetU8AtVirtualOffset) {
+  uint8_t buffer[] = {0x12, 0x34, 0x56, 0x78};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 4, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  offset_t virtual_offset = 0x1000;
+  EXPECT_EQ(extractor.GetU8(&virtual_offset), 0x12U);
+  EXPECT_EQ(virtual_offset, 0x1001U);
+
+  EXPECT_EQ(extractor.GetU8(&virtual_offset), 0x34U);
+  EXPECT_EQ(virtual_offset, 0x1002U);
+}
+
+TEST(VirtualDataExtractorTest, GetU16AtVirtualOffset) {
+  uint8_t buffer[] = {0x12, 0x34, 0x56, 0x78};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 4, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  offset_t virtual_offset = 0x1000;
+  EXPECT_EQ(extractor.GetU16(&virtual_offset), 0x3412U);
+  EXPECT_EQ(virtual_offset, 0x1002U);
+
+  EXPECT_EQ(extractor.GetU16(&virtual_offset), 0x7856U);
+  EXPECT_EQ(virtual_offset, 0x1004U);
+}
+
+TEST(VirtualDataExtractorTest, GetU32AtVirtualOffset) {
+  uint8_t buffer[] = {0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 8, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  offset_t virtual_offset = 0x1000;
+  EXPECT_EQ(extractor.GetU32(&virtual_offset), 0x78563412U);
+  EXPECT_EQ(virtual_offset, 0x1004U);
+
+  EXPECT_EQ(extractor.GetU32(&virtual_offset), 0xF0DEBC9AU);
+  EXPECT_EQ(virtual_offset, 0x1008U);
+}
+
+TEST(VirtualDataExtractorTest, GetU64AtVirtualOffset) {
+  uint8_t buffer[] = {0x12, 0x34, 0x56, 0x78, 0x9A, 0xBC, 0xDE, 0xF0};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 8, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 8,
+                                 std::move(lookup_table));
+
+  offset_t virtual_offset = 0x1000;
+  EXPECT_EQ(extractor.GetU64(&virtual_offset), 0xF0DEBC9A78563412ULL);
+  EXPECT_EQ(virtual_offset, 0x1008U);
+}
+
+TEST(VirtualDataExtractorTest, GetAddressAtVirtualOffset) {
+  uint8_t buffer[] = {0x12, 0x34, 0x56, 0x78};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 4, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  offset_t virtual_offset = 0x1000;
+  EXPECT_EQ(extractor.GetAddress(&virtual_offset), 0x78563412U);
+  EXPECT_EQ(virtual_offset, 0x1004U);
+}
+
+TEST(VirtualDataExtractorTest, BigEndian) {
+  uint8_t buffer[] = {0x12, 0x34, 0x56, 0x78};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(0x1000, 4, 0));
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderBig, 4,
+                                 std::move(lookup_table));
+
+  offset_t virtual_offset = 0x1000;
+  EXPECT_EQ(extractor.GetU16(&virtual_offset), 0x1234U);
+  EXPECT_EQ(virtual_offset, 0x1002U);
+
+  EXPECT_EQ(extractor.GetU16(&virtual_offset), 0x5678U);
+  EXPECT_EQ(virtual_offset, 0x1004U);
+}
+
+TEST(VirtualDataExtractorTest, MultipleEntries) {
+  // Create a buffer with distinct patterns for each section.
+  uint8_t buffer[] = {
+      0x01, 0x02, 0x03, 0x04, // Physical offset 0-3.
+      0x11, 0x12, 0x13, 0x14, // Physical offset 4-7.
+      0x21, 0x22, 0x23, 0x24  // Physical offset 8-11.
+  };
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  // Map different virtual address ranges to different physical offsets.
+  // Entry(base=virtual_offset, size, data=physical_offset).
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(
+      0x1000, 4, 0)); // Virt 0x1000-0x1004 -> phys 0-4.
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(
+      0x2000, 4, 4)); // Virt 0x2000-0x2004 -> phys 4-8.
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(
+      0x3000, 4, 8)); // Virt 0x3000-0x3004 -> phys 8-12.
+
+  VirtualDataExtractor extractor(buffer, sizeof(buffer), eByteOrderLittle, 4,
+                                 std::move(lookup_table));
+
+  // Test reading from first virtual range.
+  offset_t virtual_offset = 0x1000;
+  EXPECT_EQ(extractor.GetU8(&virtual_offset), 0x01U);
+
+  // Test reading from second virtual range.
+  virtual_offset = 0x2000;
+  EXPECT_EQ(extractor.GetU8(&virtual_offset), 0x11U);
+
+  // Test reading from third virtual range.
+  virtual_offset = 0x3000;
+  EXPECT_EQ(extractor.GetU8(&virtual_offset), 0x21U);
+}
+
+TEST(VirtualDataExtractorTest, NonContiguousVirtualAddresses) {
+  uint8_t buffer[] = {0xAA, 0xBB, 0xCC, 0xDD};
+
+  VirtualDataExtractor::LookupTable lookup_table;
+  // Create non-contiguous virtual address mapping.
+  // Entry(base=virtual_offset, size, data=physical_offset).
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(
+      0x1000, 2, 0)); // Virt 0x1000-0x1002 -> phys 0-2.
+  lookup_table.Append(VirtualDataExtractor::LookupTable::Entry(
+      0x5000, 2, 2)); // Virt 0x5000-0x5002 -> phys 2-4.
+
+  VirtualDataExtractor extractor...
[truncated]

@JDevlieghere
Copy link
Member Author

The motivation for this is the shared cache. A new API will allow us to get our hands on segments that are not laid out the same way they are when mapped into memory. By using the VirtualDataExtractor we can make it look like it is and avoid having to change ObjectFileMachO.

Copy link
Collaborator

@jasonmolenda jasonmolenda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, this should do what we'll need for the shared cache segment reordering. Test coverage looks good.

@JDevlieghere JDevlieghere force-pushed the virtual-data-extractor branch from 9512330 to e5f556a Compare November 21, 2025 05:20
Introduce VirtualDataExtractor, a DataExtractor subclass that enables
reading data at virtual addresses by translating them to physical buffer
offsets using a lookup table. The lookup table maps virtual address
ranges to physical offsets and enforces boundaries to prevent reads from
crossing entry limits.

The new class inherits from DataExtractor, overriding GetData and
PeekData to provide transparent virtual address translation for most of
the DataExtractor methods. The exception are the unchecked methods, that
bypass those methods and are overloaded as well.
@JDevlieghere JDevlieghere force-pushed the virtual-data-extractor branch from e5f556a to da01924 Compare November 21, 2025 05:29
@JDevlieghere
Copy link
Member Author

I wasn't happy with the repetition in the tests so I started with a helper and then realized that I could eliminate the helper by adding a constructor overload to RangeDataVector that takes an initializer list. I'm pretty happy with how it turned out.

Comment on lines +27 to +28
/// this lookup table. Reads cannot cross entry boundaries and this is
/// enforced with assertions.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume you would do partial reads of some form but neither DataExtractor or the users of this class would be setup to handle that.


VirtualDataExtractor(const void *data, lldb::offset_t data_length,
lldb::ByteOrder byte_order, uint32_t addr_size,
LookupTable lookup_table);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const LookupTable& ? I always forget whether this makes a difference, sometimes it seems to make things worse.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see you do std::move it later.

Copy link
Collaborator

@DavidSpickett DavidSpickett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

jasonmolenda added a commit to jasonmolenda/llvm-project that referenced this pull request Dec 1, 2025
ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in
to a ctor.  Subclasses will provide the DataExtrator with a Buffer
source if not.  When a DataBuffer is passed in to the base class
ctor, the DataExtractor only has its buffer initalized; we don't
yet know the address size and endianness to fully initialize the
DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP
ivar which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor.  The next patch
I will be writing is to change the ObjectFile ctor which accepts
a DataBuffer to instead accept a DataExtractorSP, so the caller
can intialize it with a DataExtractor subclass -- the
VirtualizeDataExtractor being added in
llvm#168802

The change is otherwise mechanical; all `m_data.` changed to
`m_data_up->` and all the places where `m_data` was passed in for
a by-ref call were changed to `*m_data_up.get()`.  The unique
pointer is always initialized to contain an object.

I can't remember off hand if I'm making a mistake using a unique_ptr
here, given that the ctor may take a DataExtractor as an argument.
The caller will have to do std::move(extractor_up) when it calls
the ObjectFile ctor for correct behavior.  Even though a unique_ptr
makes sense internal to ObjectFile, given that it can be passed as
an argument, should I use the more straightforward shared_ptr?  An
ObjectFile only has one of them, so the extra storage for the
refcount isn't important.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks
for getting the Linux testsuite to run on SME-only systems David).
All of the ObjectFile subclasses I modifed compile cleanly, but I
haven't tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795
@JDevlieghere JDevlieghere merged commit 9438b74 into llvm:main Dec 1, 2025
10 checks passed
@JDevlieghere JDevlieghere deleted the virtual-data-extractor branch December 1, 2025 16:27
jasonmolenda added a commit that referenced this pull request Dec 1, 2025
ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
#168802
instead of a DataBuffer which is trivially saved into the DataExtractor.

The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Dec 1, 2025
…ptr (#170066)

ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
llvm/llvm-project#168802
instead of a DataBuffer which is trivially saved into the DataExtractor.

The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Dec 3, 2025
…#168802)

Introduce VirtualDataExtractor, a DataExtractor subclass that enables
reading data at virtual addresses by translating them to physical buffer
offsets using a lookup table. The lookup table maps virtual address
ranges to physical offsets and enforces boundaries to prevent reads from
crossing entry limits.

The new class inherits from DataExtractor, overriding GetData and
PeekData to provide transparent virtual address translation for most of
the DataExtractor methods. The exception are the unchecked methods, that
bypass those methods and are overloaded as well.
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
…#168802)

Introduce VirtualDataExtractor, a DataExtractor subclass that enables
reading data at virtual addresses by translating them to physical buffer
offsets using a lookup table. The lookup table maps virtual address
ranges to physical offsets and enforces boundaries to prevent reads from
crossing entry limits.

The new class inherits from DataExtractor, overriding GetData and
PeekData to provide transparent virtual address translation for most of
the DataExtractor methods. The exception are the unchecked methods, that
bypass those methods and are overloaded as well.
kcloudy0717 pushed a commit to kcloudy0717/llvm-project that referenced this pull request Dec 4, 2025
…70066)

ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
llvm#168802
instead of a DataBuffer which is trivially saved into the DataExtractor.

The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795
honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025
…#168802)

Introduce VirtualDataExtractor, a DataExtractor subclass that enables
reading data at virtual addresses by translating them to physical buffer
offsets using a lookup table. The lookup table maps virtual address
ranges to physical offsets and enforces boundaries to prevent reads from
crossing entry limits.

The new class inherits from DataExtractor, overriding GetData and
PeekData to provide transparent virtual address translation for most of
the DataExtractor methods. The exception are the unchecked methods, that
bypass those methods and are overloaded as well.
honeygoyal pushed a commit to honeygoyal/llvm-project that referenced this pull request Dec 9, 2025
…70066)

ObjectFile has an m_data DataExtractor ivar which may be default
constructed initially, or initialized with a DataBuffer passed in to its
ctor. If the DataExtractor does not get a DataBuffer source passed in,
the subclass will initialize it with access to the object file's data.
When a DataBuffer is passed in to the base class ctor, the DataExtractor
only has its buffer initialized; ObjectFile doesn't yet know the address
size and endianness to fully initialize the DataExtractor.

This patch changes ObjectFile to instead have a DataExtractorSP ivar
which is always initialized with at least a default-constructed
DataExtractor object in the base class ctor. The next patch I will be
writing is to change the ObjectFile ctor to take an optional
DataExtractorSP, so the caller can pass a DataExtractor subclass -- the
VirtualizeDataExtractor being added via
llvm#168802
instead of a DataBuffer which is trivially saved into the DataExtractor.

The change is otherwise mechanical; all `m_data.` changed to
`m_data_sp->` and all the places where `m_data` was passed in for a
by-ref call were changed to `*m_data_sp.get()`. The shared pointer is
always initialized to contain an object.

I built & ran the testsuite on macOS and on aarch64-Ubuntu (thanks for
getting the Linux testsuite to run on SME-only systems David). All of
the ObjectFile subclasses I modifed compile cleanly, but I haven't
tested them beyond any unit tests they may have (prob breakpad).

rdar://148939795
jasonmolenda added a commit that referenced this pull request Dec 11, 2025
The ObjectFile plugin interface accepts an optional DataBufferSP
argument. If the caller has the contents of the binary, it can provide
this in that DataBufferSP. The ObjectFile subclasses in their
CreateInstance methods will fill in the DataBufferSP with the actual
binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.

My next patch will be a caller that creates a VirtualDataExtractor with
the binary data, and needs to pass that in to the ObjectFile plugin,
instead of the bag-of-bytes DataBufferSP. It builds on the previous
patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP
so I could pass in a subclass in the shared ptr. And it will be using
the VirtualDataExtractor that Jonas added in
#168802

No behavior is changed by the patch; we're simply moving the creation of
the DataExtractor to the caller, instead of a DataBuffer that is
immediately used to set up the ObjectFile DataExtractor. The patch is a
bit complicated because all of the ObjectFile subclasses have to
initialize their DataExtractor to pass in to the base class.

I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it
under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked
lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the
SME, that explains why the testsuite passes)

rdar://148939795

---------

Co-authored-by: Jonas Devlieghere <[email protected]>
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Dec 11, 2025
The ObjectFile plugin interface accepts an optional DataBufferSP
argument. If the caller has the contents of the binary, it can provide
this in that DataBufferSP. The ObjectFile subclasses in their
CreateInstance methods will fill in the DataBufferSP with the actual
binary contents if it is not set.
ObjectFile base class creates an ivar DataExtractor from the
DataBufferSP passed in.

My next patch will be a caller that creates a VirtualDataExtractor with
the binary data, and needs to pass that in to the ObjectFile plugin,
instead of the bag-of-bytes DataBufferSP. It builds on the previous
patch changing ObjectFile's ivar from DataExtractor to DataExtractorSP
so I could pass in a subclass in the shared ptr. And it will be using
the VirtualDataExtractor that Jonas added in
llvm/llvm-project#168802

No behavior is changed by the patch; we're simply moving the creation of
the DataExtractor to the caller, instead of a DataBuffer that is
immediately used to set up the ObjectFile DataExtractor. The patch is a
bit complicated because all of the ObjectFile subclasses have to
initialize their DataExtractor to pass in to the base class.

I ran the testsuite on macOS and on AArch64 Ubutnu. (btw David, I ran it
under qemu on my M4 mac with SME-no-SVE again, Ubuntu 25.10, checked
lshw(1) cpu capabilities, and qemu doesn't seem to be virtualizing the
SME, that explains why the testsuite passes)

rdar://148939795

---------

Co-authored-by: Jonas Devlieghere <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants