Skip to content

Conversation

@alx32
Copy link
Contributor

@alx32 alx32 commented Dec 18, 2024

This patch extends the MachO linker's map file generation to include branch extension thunk symbols. Previously, thunks were omitted from the map file, making it difficult to understand the final layout of the binary, especially when debugging issues related to long branch thunks. This change ensures thunks are included and correctly interleaved with other symbols based on their address, providing an accurate representation of the linked output.

This patch extends the MachO linker's map file generation to include branch extension thunk symbols. Previously, thunks were omitted from the map file, making it difficult to understand the final layout of the binary, especially when debugging issues related to long branch thunks. This change ensures thunks are included and correctly interleaved with other symbols based on their address, providing an accurate representation of the linked output.
@alx32 alx32 marked this pull request as ready for review December 18, 2024 23:54
@llvmbot
Copy link
Member

llvmbot commented Dec 18, 2024

@llvm/pr-subscribers-lld-macho

@llvm/pr-subscribers-lld

Author: None (alx32)

Changes

This patch extends the MachO linker's map file generation to include branch extension thunk symbols. Previously, thunks were omitted from the map file, making it difficult to understand the final layout of the binary, especially when debugging issues related to long branch thunks. This change ensures thunks are included and correctly interleaved with other symbols based on their address, providing an accurate representation of the linked output.


Full diff: https://github.com/llvm/llvm-project/pull/120496.diff

4 Files Affected:

  • (modified) lld/MachO/ConcatOutputSection.h (+11-4)
  • (modified) lld/MachO/MapFile.cpp (+28-1)
  • (modified) lld/MachO/OutputSection.h (+1)
  • (modified) lld/test/MachO/arm64-thunks.s (+25-1)
diff --git a/lld/MachO/ConcatOutputSection.h b/lld/MachO/ConcatOutputSection.h
index 9af661d0ab1e0c..c2449e82b73ad1 100644
--- a/lld/MachO/ConcatOutputSection.h
+++ b/lld/MachO/ConcatOutputSection.h
@@ -25,8 +25,9 @@ class Defined;
 // in the final binary.
 class ConcatOutputSection : public OutputSection {
 public:
-  explicit ConcatOutputSection(StringRef name)
-      : OutputSection(ConcatKind, name) {}
+  explicit ConcatOutputSection(StringRef name,
+                               OutputSection::Kind kind = ConcatKind)
+      : OutputSection(kind, name) {}
 
   const ConcatInputSection *firstSection() const { return inputs.front(); }
   const ConcatInputSection *lastSection() const { return inputs.back(); }
@@ -46,7 +47,7 @@ class ConcatOutputSection : public OutputSection {
   void writeTo(uint8_t *buf) const override;
 
   static bool classof(const OutputSection *sec) {
-    return sec->kind() == ConcatKind;
+    return sec->kind() == ConcatKind || sec->kind() == TextKind;
   }
 
   static ConcatOutputSection *getOrCreateForInput(const InputSection *);
@@ -66,12 +67,18 @@ class ConcatOutputSection : public OutputSection {
 // support thunk insertion.
 class TextOutputSection : public ConcatOutputSection {
 public:
-  explicit TextOutputSection(StringRef name) : ConcatOutputSection(name) {}
+  explicit TextOutputSection(StringRef name)
+      : ConcatOutputSection(name, TextKind) {}
   void finalizeContents() override {}
   void finalize() override;
   bool needsThunks() const;
+  const std::vector<ConcatInputSection *> &getThunks() const { return thunks; }
   void writeTo(uint8_t *buf) const override;
 
+  static bool classof(const OutputSection *sec) {
+    return sec->kind() == TextKind;
+  }
+
 private:
   uint64_t estimateStubsInRangeVA(size_t callIdx) const;
 
diff --git a/lld/MachO/MapFile.cpp b/lld/MachO/MapFile.cpp
index 9c0621622ae2f0..6fdea64cbefa04 100644
--- a/lld/MachO/MapFile.cpp
+++ b/lld/MachO/MapFile.cpp
@@ -161,6 +161,28 @@ static uint64_t getSymSizeForMap(Defined *sym) {
   return sym->size;
 }
 
+// Merges two vectors of input sections in order of their outSecOff values.
+// This approach creates a new (temporary) vector which is not ideal but the
+// ideal approach leads to a lot of code duplication.
+static std::vector<ConcatInputSection *>
+mergeOrderedInputs(const std::vector<ConcatInputSection *> &inputs1,
+                   const std::vector<ConcatInputSection *> &inputs2) {
+  std::vector<ConcatInputSection *> vec;
+  size_t i = 0, ie = inputs1.size();
+  size_t t = 0, te = inputs2.size();
+  while (i < ie || t < te) {
+    while (i < ie &&
+           (t == te || inputs1[i]->outSecOff <= inputs2[t]->outSecOff)) {
+      vec.push_back(inputs1[i++]);
+    }
+    while (t < te &&
+           (i == ie || inputs2[t]->outSecOff < inputs1[i]->outSecOff)) {
+      vec.push_back(inputs2[t++]);
+    }
+  }
+  return vec;
+}
+
 void macho::writeMapFile() {
   if (config->mapFile.empty())
     return;
@@ -220,7 +242,12 @@ void macho::writeMapFile() {
   os << "# Address\tSize    \tFile  Name\n";
   for (const OutputSegment *seg : outputSegments) {
     for (const OutputSection *osec : seg->getSections()) {
-      if (auto *concatOsec = dyn_cast<ConcatOutputSection>(osec)) {
+      const TextOutputSection *textOsec = dyn_cast<TextOutputSection>(osec);
+      if (textOsec && textOsec->getThunks().size()) {
+        auto inputsAndThunks =
+            mergeOrderedInputs(textOsec->inputs, textOsec->getThunks());
+        printIsecArrSyms(inputsAndThunks);
+      } else if (auto *concatOsec = dyn_cast<ConcatOutputSection>(osec)) {
         printIsecArrSyms(concatOsec->inputs);
       } else if (osec == in.cStringSection || osec == in.objcMethnameSection) {
         const auto &liveCStrings = info.liveCStringsForSection.lookup(osec);
diff --git a/lld/MachO/OutputSection.h b/lld/MachO/OutputSection.h
index 5297a03c2cfa7f..9afd3a9eeb1928 100644
--- a/lld/MachO/OutputSection.h
+++ b/lld/MachO/OutputSection.h
@@ -37,6 +37,7 @@ class OutputSection {
   enum Kind {
     ConcatKind,
     SyntheticKind,
+    TextKind,
   };
 
   OutputSection(Kind kind, StringRef name) : name(name), sectionKind(kind) {}
diff --git a/lld/test/MachO/arm64-thunks.s b/lld/test/MachO/arm64-thunks.s
index d887359bbc23e1..8d7bb154379bc4 100644
--- a/lld/test/MachO/arm64-thunks.s
+++ b/lld/test/MachO/arm64-thunks.s
@@ -8,14 +8,38 @@
 ## (4) early calls to a dylib stub use a thunk, and later calls the stub
 ##     directly
 ## (5) Thunks are created for all sections in the text segment with branches.
+## (6) Thunks are in the linker map file.
 ## Notes:
 ## 0x4000000 = 64 Mi = half the magnitude of the forward-branch range
 
 # RUN: rm -rf %t; mkdir %t
 # RUN: llvm-mc -filetype=obj -triple=arm64-apple-darwin %s -o %t/input.o
-# RUN: %lld -arch arm64 -dead_strip -lSystem -U _extern_sym -o %t/thunk %t/input.o
+# RUN: %lld -arch arm64 -dead_strip -lSystem -U _extern_sym -map %t/thunk.map -o %t/thunk %t/input.o
 # RUN: llvm-objdump --no-print-imm-hex -d --no-show-raw-insn %t/thunk | FileCheck %s
 
+## Check that the thunks appear in the map file and that everything is sorted by address
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _b
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _c
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _d.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _e.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _f.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _g.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _h.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] ___nan.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _d
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _e
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _f
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _g
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _a.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _b.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _h
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _main
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _c.thunk.0
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _d.thunk.1
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _e.thunk.1
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _f.thunk.1
+# MAP: [[0x[0-9A-Fa-f]+]] 0x[0-9A-Fa-f]+ \[[0-9]+\] _z
+
 # CHECK: Disassembly of section __TEXT,__text:
 
 # CHECK: [[#%.13x, A_PAGE:]][[#%.3x, A_OFFSET:]] <_a>:

Comment on lines 21 to 23
# Because of the `.space` instructions, there will end up being a lot of dead symbols in the linker map
# so generate a version of the linker map without dead symbols.
# RUN: awk '/# Dead Stripped Symbols:/ {exit} {print}' %t/thunk.map > %t/thunk_no_dead_syms.map
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the point of this awk command. The last symbol you check is _z and then presumably there will be dead symbols after that. But the MAP check lines are done, so there is no problem?

Copy link
Contributor Author

@alx32 alx32 Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The issue is that because the .space. commands, linker map ends up being 2.7GB in size - with apparently each character added via .space being interpreted as a dead symbol (this might be another unrelated issue).

So the issue is that if the test were to break, then it would take a very very long time to complete (I didn't wait to fully time it) - trying to match the regex against the 2.7GB file.

This is just an optimization to have the test complete in a practical amount of time - even in case of failure.

@alx32
Copy link
Contributor Author

alx32 commented Jan 7, 2025

@carlocab / @ellishg - I think all issues should be fixed in the latest version.

Copy link
Member

@carlocab carlocab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not super familiar with this part of LLD, so would be good to get an approval from someone else. But looks reasonable to me as far as I can tell.

@alx32 alx32 merged commit 162814a into llvm:main Jan 8, 2025
8 checks passed
@tstellar
Copy link
Collaborator

I'm seeing some failures on our i386 builds after this commit:
https://download.copr.fedorainfracloud.org/results/@fedora-llvm-team/llvm-snapshots-big-merge-20250110/fedora-40-i386/08497704-llvm/builder-live.log.gz

Stack trace from the log below, but it's possible that it is running out of memory.

terminate called after throwing an instance of 'std::bad_alloc'
  what():  std::bad_alloc
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
 #0 0xef659802 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/lib/Support/Unix/Signals.inc:723:13
 #1 0xef659e16 PrintStackTraceSignalHandler(void*) /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/lib/Support/Unix/Signals.inc:797:3
 #2 0xef657024 llvm::sys::RunSignalHandlers() /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/lib/Support/Signals.cpp:105:20
 #3 0xef659ffd SignalHandler(int) /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/lib/Support/Unix/Signals.inc:413:1
 #4 0xf7fa15a0 (linux-gate.so.1+0x5a0)
 #5 0xf7fa1589 (linux-gate.so.1+0x589)
 #6 0xeeb0c28f __pthread_kill_implementation (/lib/libc.so.6+0x9628f)
 #7 0xeeab2321 gsignal (/lib/libc.so.6+0x3c321)
 #8 0xeea99258 abort (/lib/libc.so.6+0x23258)
 #9 0xeed28d0c (/lib/libstdc++.so.6+0x7fd0c)
#10 0xeed3fca8 (/lib/libstdc++.so.6+0x96ca8)
#11 0xeed287f2 std::unexpected() (/lib/libstdc++.so.6+0x7f7f2)
#12 0xeed3fff0 __cxa_rethrow (/lib/libstdc++.so.6+0x96ff0)
#13 0xef583d29 lock /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/bits/std_mutex.h:117:2
#14 0xef583d29 lock_guard /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/bits/std_mutex.h:250:19
#15 0xef583d29 llvm::report_bad_alloc_error(char const*, bool) /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/lib/Support/ErrorHandling.cpp:155:33
#16 0xef5d25ac getNewCapacity<unsigned int> /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/lib/Support/SmallVector.cpp:103:5
#17 0xef5d25ac llvm::SmallVectorBase<unsigned int>::grow_pod(void*, unsigned int, unsigned int) /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/lib/Support/SmallVector.cpp:148:24
#18 0xf7bba94f begin /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/ADT/SmallVector.h:267:45
#19 0xf7bba94f end /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/ADT/SmallVector.h:269:27
#20 0xf7bba94f push_back /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/ADT/SmallVector.h:563:43
#21 0xf7bba94f gatherMapInfo /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/MapFile.cpp:100:35
#22 0xf7bba94f lld::macho::writeMapFile() /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/MapFile.cpp:196:18
#23 0xf7c1cb2d _M_ptr /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:193:51
#24 0xf7c1cb2d get /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:464:21
#25 0xf7c1cb2d operator-> /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/bits/unique_ptr.h:457:9
#26 0xf7c1cb2d operator() /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/Writer.cpp:1361:32
#27 0xf7c1cb2d operator()<(lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/Writer.cpp:1357:30) &> /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/Support/thread.h:43:11
#28 0xf7c1cb2d __invoke_impl<void, (lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/Support/thread.h:42:9), (lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/Writer.cpp:1357:30) &> /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/bits/invoke.h:61:14
#29 0xf7c1cb2d __invoke<(lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/Support/thread.h:42:9), (lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/Writer.cpp:1357:30) &> /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/bits/invoke.h:96:14
#30 0xf7c1cb2d __apply_impl<(lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/Support/thread.h:42:9), std::tuple<(lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/Writer.cpp:1357:30)> &, 0U> /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/tuple:2921:14
#31 0xf7c1cb2d apply<(lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/Support/thread.h:42:9), std::tuple<(lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/Writer.cpp:1357:30)> &> /usr/bin/../lib/gcc/i686-redhat-linux/14/../../../../include/c++/14/tuple:2936:14
#32 0xf7c1cb2d GenericThreadProxy<std::tuple<(lambda at /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/MachO/Writer.cpp:1357:30)> > /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/Support/thread.h:41:5
#33 0xf7c1cb2d void* llvm::thread::ThreadProxy<std::tuple<void (anonymous namespace)::Writer::run<lld::macho::LP64>()::'lambda'()>>(void*) /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/include/llvm/Support/thread.h:55:5
#34 0xeeb09e73 start_thread (/lib/libc.so.6+0x93e73)
#35 0xeeb90d38 __GI___clone3 (/lib/libc.so.6+0x11ad38)
/builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/redhat-linux-build/tools/lld/test/MachO/Output/arm64-thunks.s.script: line 6: 605633 Aborted                 (core dumped) ld64.lld -arch x86_64 -platform_version macos 11.0 11.0 -syslibroot /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/lld/test/MachO/Inputs/MacOSX.sdk -lSystem -fatal_warnings -arch arm64 -dead_strip -lSystem -U _extern_sym -map /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/redhat-linux-build/tools/lld/test/MachO/Output/arm64-thunks.s.tmp/thunk.map -o /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/redhat-linux-build/tools/lld/test/MachO/Output/arm64-thunks.s.tmp/thunk /builddir/build/BUILD/llvm-project-0e1c5bfac8574bc8419968279069c76b55e5f270/llvm/redhat-linux-build/tools/lld/test/MachO/Output/arm64-thunks.s.tmp/input.o

@alx32
Copy link
Contributor Author

alx32 commented Jan 11, 2025

With this commit we do add a new temporary array that can be rather large for big binaries. So running out of memory on x86 is a possibility. Does it happen consistently ? Is this just when running arm64-thunks.s test on Linux x86 without any special config ?

@tstellar
Copy link
Collaborator

It's been failing consistently the last 3 days we've run the builds. The CMake options we use are in the log file. I'm not sure if our config is considered special or not.

@tstellar
Copy link
Collaborator

I gathered some stats on my x86_64 machine with /usr/bin/time -v ./lld-build/bin/llvm-lit -v lld/test/MachO/arm64-thunks.s 2>&1| grep -e "Testing Time" -e "Maximum resident set size"

Before 162814a:

Testing Time: 4.56s
        Maximum resident set size (kbytes): 1666472

After 162814a:

Testing Time: 17.61s
        Maximum resident set size (kbytes): 2984488

So a pretty significant increase in memory usage and run time as well. I would consider this kind of run time increase a regression on all targets. Is there anything that can be done to speed up the tests/reduce the memory usage?

@alx32
Copy link
Contributor Author

alx32 commented Jan 13, 2025

#122785 gets rid of the extra memory required by this PR.
I think this should address the crash - though I wasn't able to repro it myself.

alx32 added a commit that referenced this pull request Jan 16, 2025
This commit improves the memory efficiency of the lld-macho linker by
optimizing how thunks are printed in the map file. Previously, merging
vectors of input sections required creating a temporary vector, which
increased memory usage and in some cases caused the linker to run out of
memory as reported in comments on
#120496. The new approach
interleaves the printing of two arrays of ConcatInputSection in sorted
order without allocating additional memory for a merged array.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this pull request Jan 16, 2025
…file (#122785)

This commit improves the memory efficiency of the lld-macho linker by
optimizing how thunks are printed in the map file. Previously, merging
vectors of input sections required creating a temporary vector, which
increased memory usage and in some cases caused the linker to run out of
memory as reported in comments on
llvm/llvm-project#120496. The new approach
interleaves the printing of two arrays of ConcatInputSection in sorted
order without allocating additional memory for a merged array.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants