Skip to content

Conversation

@apolloww
Copy link
Contributor

The address of the first loadable segment can be zero, so FirstLoadableAddress may actually point to the second loadable segment. Use an optional instead.

@llvmbot llvmbot added the PGO Profile Guided Optimizations label Jul 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Jul 29, 2025

@llvm/pr-subscribers-pgo

Author: Wei Wang (apolloww)

Changes

The address of the first loadable segment can be zero, so FirstLoadableAddress may actually point to the second loadable segment. Use an optional instead.


Full diff: https://github.com/llvm/llvm-project/pull/151257.diff

1 Files Affected:

  • (modified) llvm/tools/llvm-profgen/ProfiledBinary.h (+5-2)
diff --git a/llvm/tools/llvm-profgen/ProfiledBinary.h b/llvm/tools/llvm-profgen/ProfiledBinary.h
index 0588cb48b2af6..8bc7e7b3b0061 100644
--- a/llvm/tools/llvm-profgen/ProfiledBinary.h
+++ b/llvm/tools/llvm-profgen/ProfiledBinary.h
@@ -199,7 +199,7 @@ class ProfiledBinary {
   // The runtime base address that the first executable segment is loaded at.
   uint64_t BaseAddress = 0;
   // The runtime base address that the first loadabe segment is loaded at.
-  uint64_t FirstLoadableAddress = 0;
+  std::optional<uint64_t> FirstLoadableAddress;
   // The preferred load address of each executable segment.
   std::vector<uint64_t> PreferredTextSegmentAddresses;
   // The file offset of each executable segment.
@@ -381,7 +381,10 @@ class ProfiledBinary {
     return PreferredTextSegmentAddresses[0];
   }
   // Return the preferred load address for the first loadable segment.
-  uint64_t getFirstLoadableAddress() const { return FirstLoadableAddress; }
+  uint64_t getFirstLoadableAddress() const {
+    assert(FirstLoadableAddress && "FirstLoadableAddress must be set.");
+    return *FirstLoadableAddress;
+  }
   // Return the file offset for the first executable segment.
   uint64_t getTextSegmentOffset() const { return TextSegmentOffsets[0]; }
   const std::vector<uint64_t> &getPreferredTextSegmentAddresses() const {

@apolloww apolloww requested review from WenleiHe and wlei-llvm July 29, 2025 23:54
@snehasish
Copy link

I think the convention is to leave the address zero unmapped (ie. at least the first page) so that null pointer dereferences cause segmentation faults. Can you describe the situation where you have seen this behaviour?

@apolloww apolloww force-pushed the profgen_fix_firstloadableaddr branch from c389225 to 6b8ca0f Compare July 30, 2025 00:02
@apolloww
Copy link
Contributor Author

I think the convention is to leave the address zero unmapped (ie. at least the first page) so that null pointer dereferences cause segmentation faults. Can you describe the situation where you have seen this behaviour?

I saw FirstLoadableAddress is set twice when loading segments of one ELF binary I am testing with.

Elf file type is DYN (Shared object file)
Entry point 0x1120
There are 13 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000000040 0x0000000000000040
                 0x00000000000002d8 0x00000000000002d8  R      0x8
  INTERP         0x0000000000000318 0x0000000000000318 0x0000000000000318
                 0x000000000000001c 0x000000000000001c  R      0x1
      [Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
  LOAD           0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000ff0 0x0000000000000ff0  R      0x1000
  LOAD           0x0000000000001000 0x0000000000001000 0x0000000000001000
                 0x00000000000008a5 0x00000000000008a5  R E    0x1000
  LOAD           0x0000000000002000 0x0000000000002000 0x0000000000002000
                 0x000000000000041c 0x000000000000041c  R      0x1000
  LOAD           0x0000000000002d80 0x0000000000003d80 0x0000000000003d80
                 0x0000000000000300 0x0000000000000308  RW     0x1000
  DYNAMIC        0x0000000000002da0 0x0000000000003da0 0x0000000000003da0
                 0x0000000000000220 0x0000000000000220  RW     0x8
  NOTE           0x0000000000000338 0x0000000000000338 0x0000000000000338
                 0x0000000000000020 0x0000000000000020  R      0x8
  NOTE           0x0000000000000358 0x0000000000000358 0x0000000000000358
                 0x0000000000000020 0x0000000000000020  R      0x4
  GNU_PROPERTY   0x0000000000000338 0x0000000000000338 0x0000000000000338
                 0x0000000000000020 0x0000000000000020  R      0x8
  GNU_EH_FRAME   0x0000000000002020 0x0000000000002020 0x0000000000002020
                 0x00000000000000bc 0x00000000000000bc  R      0x4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     0x10
  GNU_RELRO      0x0000000000002d80 0x0000000000003d80 0x0000000000003d80
                 0x0000000000000280 0x0000000000000280  R      0x1

The binary has first LOAD segment p_vaddr at 0. If it's a convention to leave the address zero unmapped, then this pr is unnecessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

PGO Profile Guided Optimizations

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants