Skip to content

Conversation

kostasalv
Copy link
Contributor

@kostasalv kostasalv commented May 21, 2025

Scope

This patch represents an initial portion of the ongoing work to port [BOLT] for [PowerPC] architecture.
Additional changes will follow to complete the port.

What does this patch do?

This pull request ports the [BOLT] createPushRegisters method to [PowerPC] target. That helps [BOLT] generate
[PowerPC] instructions to save registers on the stack.

Motivation

Port [BOLT] for [PowerPC] architecture.

Testing

  • Built with 'ninja'
  • Ran 'ninja check-lit' (all tests pass)
  • Ran clang-format, no changes required

Copy link

github-actions bot commented May 21, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@maksfb
Copy link
Contributor

maksfb commented May 22, 2025

That's exciting to see BOLT getting PowerPC support!

Could you share more details about the project? Were you able to see real gains or is this still WIP?

@kostasalv
Copy link
Contributor Author

Thank you for the feedback!
This is an initial contribution towards PowerPC support, and the work is still in progress. I plan to follow up with additional changes to complete the port. I don't have performance data available yet, but I plan to collect and share results as the implementation progresses.

@kostasalv kostasalv changed the title BOLT PowerPC Port [BOLT] PowerPC Port May 26, 2025
@kostasalv kostasalv changed the title [BOLT] PowerPC Port [BOLT] [PowerPC] Port May 26, 2025
@kostasalv
Copy link
Contributor Author

Hi @maksfb, I did git push force recently. I noticed the [BOLT] label disappeared. Could it be re-applied manually? Thanks!

@maksfb maksfb added the BOLT label May 27, 2025
@llvmbot
Copy link
Member

llvmbot commented May 27, 2025

@llvm/pr-subscribers-bolt

Author: Kostas (kostasalv)

Changes

Scope

This patch represents an initial portion of the ongoing work to port [BOLT] for [PowerPC] architecture.
Additional changes will follow to complete the port.

What does this patch do?

This pull request ports the [BOLT] createPushRegisters method to [PowerPC] target. That helps [BOLT] generate
[PowerPC] instructions to save registers on the stack.

Motivation

Port [BOLT] for [PowerPC] architecture.

Testing

  • Built with 'ninja'
  • Ran 'ninja check-lit' (all tests pass)
  • Ran clang-format, no changes required

Full diff: https://github.com/llvm/llvm-project/pull/140894.diff

6 Files Affected:

  • (modified) bolt/CMakeLists.txt (+1-1)
  • (modified) bolt/include/bolt/Core/MCPlusBuilder.h (+5)
  • (modified) bolt/lib/Rewrite/RewriteInstance.cpp (+5)
  • (modified) bolt/lib/Target/CMakeLists.txt (+3-1)
  • (added) bolt/lib/Target/PowerPC/CMakeLists.txt (+29)
  • (added) bolt/lib/Target/PowerPC/PPCMCPlusBuilder.cpp (+54)
diff --git a/bolt/CMakeLists.txt b/bolt/CMakeLists.txt
index 52c796518ac05..c18216d760808 100644
--- a/bolt/CMakeLists.txt
+++ b/bolt/CMakeLists.txt
@@ -58,7 +58,7 @@ endif() # standalone
 
 # Determine default set of targets to build -- the intersection of
 # those BOLT supports and those LLVM is targeting.
-set(BOLT_TARGETS_TO_BUILD_all "AArch64;X86;RISCV")
+set(BOLT_TARGETS_TO_BUILD_all "AArch64;X86;RISCV;PowerPC")
 set(BOLT_TARGETS_TO_BUILD_default)
 foreach (tgt ${BOLT_TARGETS_TO_BUILD_all})
   if (tgt IN_LIST LLVM_TARGETS_TO_BUILD)
diff --git a/bolt/include/bolt/Core/MCPlusBuilder.h b/bolt/include/bolt/Core/MCPlusBuilder.h
index 132d58f3f9f79..fa043a49bb87e 100644
--- a/bolt/include/bolt/Core/MCPlusBuilder.h
+++ b/bolt/include/bolt/Core/MCPlusBuilder.h
@@ -2293,6 +2293,11 @@ MCPlusBuilder *createRISCVMCPlusBuilder(const MCInstrAnalysis *,
                                         const MCRegisterInfo *,
                                         const MCSubtargetInfo *);
 
+MCPlusBuilder *createPowerPCMCPlusBuilder(const MCInstrAnalysis *,
+                                          const MCInstrInfo *,
+                                          const MCRegisterInfo *,
+                                          const MCSubtargetInfo *);
+
 } // namespace bolt
 } // namespace llvm
 
diff --git a/bolt/lib/Rewrite/RewriteInstance.cpp b/bolt/lib/Rewrite/RewriteInstance.cpp
index ad062ea3622d1..dbd9ecbaf1c66 100644
--- a/bolt/lib/Rewrite/RewriteInstance.cpp
+++ b/bolt/lib/Rewrite/RewriteInstance.cpp
@@ -304,6 +304,11 @@ MCPlusBuilder *createMCPlusBuilder(const Triple::ArchType Arch,
     return createRISCVMCPlusBuilder(Analysis, Info, RegInfo, STI);
 #endif
 
+#ifdef POWERPC_AVAILABLE
+  if (Arch == Triple::ppc64 || Arch == Triple::ppc64le)
+    return createPowerPCMCPlusBuilder(Analysis, Info, RegInfo, STI);
+#endif
+
   llvm_unreachable("architecture unsupported by MCPlusBuilder");
 }
 
diff --git a/bolt/lib/Target/CMakeLists.txt b/bolt/lib/Target/CMakeLists.txt
index eae8ebdddbf3f..38d423ac9483c 100644
--- a/bolt/lib/Target/CMakeLists.txt
+++ b/bolt/lib/Target/CMakeLists.txt
@@ -1,3 +1,5 @@
 foreach (tgt ${BOLT_TARGETS_TO_BUILD})
   add_subdirectory(${tgt})
-endforeach()
+  string(TOUPPER ${tgt} TGT_UPPER)
+  add_definitions(-D${TGT_UPPER}_AVAILABLE)
+endforeach()
\ No newline at end of file
diff --git a/bolt/lib/Target/PowerPC/CMakeLists.txt b/bolt/lib/Target/PowerPC/CMakeLists.txt
new file mode 100644
index 0000000000000..c1d2a054396d7
--- /dev/null
+++ b/bolt/lib/Target/PowerPC/CMakeLists.txt
@@ -0,0 +1,29 @@
+add_llvm_library(LLVMBOLTTargetPowerPC
+    PPCMCPlusBuilder.cpp
+)
+
+target_include_directories(LLVMBOLTTargetPowerPC PRIVATE
+    ${LLVM_BINARY_DIR}/include
+    ${LLVM_SOURCE_DIR}/include
+)
+
+file(MAKE_DIRECTORY "${LLVM_BINARY_DIR}/include/llvm/Target/PowerPC")
+
+foreach(incfile IN ITEMS
+    PPCGenInstrInfo.inc
+    PPCGenRegisterInfo.inc
+)
+    add_custom_command(
+        OUTPUT "${LLVM_BINARY_DIR}/include/llvm/Target/PowerPC/${incfile}"
+        COMMAND ${CMAKE_COMMAND} -E copy_if_different
+            "${LLVM_BINARY_DIR}/lib/Target/PowerPC/${incfile}"
+            "${LLVM_BINARY_DIR}/include/llvm/Target/PowerPC/${incfile}"
+        DEPENDS "${LLVM_BINARY_DIR}/lib/Target/PowerPC/${incfile}"
+        COMMENT "Copying ${incfile} to include directory"
+    )
+    add_custom_target(
+        "BoltCopy${incfile}" ALL
+        DEPENDS "${LLVM_BINARY_DIR}/include/llvm/Target/PowerPC/${incfile}"
+    )
+    add_dependencies(LLVMBOLTTargetPowerPC "BoltCopy${incfile}")
+endforeach()
\ No newline at end of file
diff --git a/bolt/lib/Target/PowerPC/PPCMCPlusBuilder.cpp b/bolt/lib/Target/PowerPC/PPCMCPlusBuilder.cpp
new file mode 100644
index 0000000000000..39d5ed2d3e36e
--- /dev/null
+++ b/bolt/lib/Target/PowerPC/PPCMCPlusBuilder.cpp
@@ -0,0 +1,54 @@
+//===- bolt/Target/PowerPC/PPCMCPlusBuilder.cpp -----------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file provides PowerPC-specific MCPlus builder.
+//
+//===----------------------------------------------------------------------===//
+
+#include "bolt/Core/MCPlusBuilder.h"
+#include "llvm/MC/MCInst.h"
+#include "llvm/MC/MCRegisterInfo.h"
+#define GET_INSTRINFO_ENUM
+#include "llvm/Target/PowerPC/PPCGenInstrInfo.inc"
+#define GET_REGINFO_ENUM
+#include "llvm/Target/PowerPC/PPCGenRegisterInfo.inc"
+
+namespace llvm {
+namespace bolt {
+
+class PPCMCPlusBuilder : public MCPlusBuilder {
+public:
+  using MCPlusBuilder::MCPlusBuilder;
+
+  // Create instructions to push two registers onto the stack
+  static void createPushRegisters(MCInst &Inst1, MCInst &Inst2, MCPhysReg Reg1,
+                                  MCPhysReg /*Reg2*/) {
+
+    Inst1.clear();
+    Inst1.setOpcode(PPC::STDU);
+    Inst1.addOperand(MCOperand::createReg(PPC::R1)); // destination (SP)
+    Inst1.addOperand(MCOperand::createReg(PPC::R1)); // base (SP)
+    Inst1.addOperand(MCOperand::createImm(-16));     // offset
+
+    Inst2.clear();
+    Inst2.setOpcode(PPC::STD);
+    Inst2.addOperand(MCOperand::createReg(Reg1));    // source register
+    Inst2.addOperand(MCOperand::createReg(PPC::R1)); // base (SP)
+    Inst2.addOperand(MCOperand::createImm(0));       // offset
+  }
+};
+
+MCPlusBuilder *createPowerPCMCPlusBuilder(const MCInstrAnalysis *Analysis,
+                                          const MCInstrInfo *Info,
+                                          const MCRegisterInfo *RegInfo,
+                                          const MCSubtargetInfo *STI) {
+  return new PPCMCPlusBuilder(Analysis, Info, RegInfo, STI);
+}
+
+} // namespace bolt
+} // namespace llvm

@kostasalv
Copy link
Contributor Author

kostasalv commented Jun 16, 2025

Gentle ping @aaupov @maksfb @rafaelauler @paschalis-mpeis @yota9 @ayermolo — when you have a moment, could you please take a look at this PR?
Thank you!

@yota9
Copy link
Member

yota9 commented Jun 21, 2025

Hello @kostasalv ! Thank you for your work. But from my side I would say that maybe this port is too basic. I mean I would expect at least hello world test to be compiled and optimized to submit this PR. This is my IMHO, it's up to @maksfb to decide at the end, but maybe it would be possible to add couple of relocations & etc support to at least run couple very basic C (or even asm) tests with BOLT? thank you!

@maksfb
Copy link
Contributor

maksfb commented Jun 22, 2025

Hi, @kostasalv,

The PR serves well as a skeleton for the backend. For a minimum implementation, I agree with @yota9 that it's reasonable to expect it to be able to process a simple program such as "Hello world".

I suggest you add a simple symbolic disassembler for PowerPC similar to X86MCSymbolizer and include a couple of PowerPC-specific relocations in the implementation. If you have patches on top of this one, feel free to share so that we can start reviewing them.

Regarding the current PR, what was the reason for copying *.inc files for the new target and including them directly rather than via PPCMCTargetDesc.h?

@kostasalv
Copy link
Contributor Author

Hi @yota9 and @maksfb ,

Thank you very much for your guidance and for the review comments.
I agree that supporting at least a "Hello world" test case, including compilation and optimisation, is a sensible goal for this initial PR. I will work on extending the implementation to cover this.

I also plan to add a basic symbolic disassembler for PowerPC, based on X86MCSymbolizer, and start supporting some PowerPC-specific relocations as suggested.

At this time, I don’t have any additional patches on top of this one, but I will be sure to share any updates as they become available.

Thank you as well for the suggestion regarding PPCMCTargetDesc.h. I will update the code to include this header and remove the copying of *.inc files in the CMakeLists.

@maksfb
Copy link
Contributor

maksfb commented Jun 23, 2025

@kostasalv, sounds good. You can take a look at other targets (x86, aarch64, RISC-V), they follow pretty much the same pattern in respective CMakeLists.txt files.

For the "Hello world" test, you can start with building CFG and verifying a couple of symbolic reference resolutions.

x86-64 and AArch64 have their versions of symbolic disassembler (*MCSymbolizer class). I suggest you take a look at those to get an idea of what it takes to restore symbol references. Note that due to the lack of active maintenance, RISC-V does not have its symbolizer and has target-specific code in BinaryFunction.cpp.

@kostasalv kostasalv requested a review from yozhu as a code owner August 26, 2025 17:41
@kostasalv kostasalv force-pushed the bolt-ppc-port branch 6 times, most recently from 3d56cb2 to 802fcb1 Compare August 27, 2025 19:10
Introduce a PowerPC64 implementation of PLT section disassembly in RewriteInstance. This enables BOLT to properly identify and skip PLT stubs on PPC64, mirroring existing handing for the other targets.

With this change, BOLT no longer attempts to treat PPC64 PLT stubs as regular functions, avoiding crashes and mis-optimisation. This lays the groundwork for running BOLT on dynamically linked PPC64 binaries that depend on libc calls (e.g. 'printf')
BOLT relies on treating tail calls as calls to build correct CFGs and preserve interprocedural analysis. On PowerPC, the compiler often emits plain branches (b, ba, bctr) instead of explicit calls when lowering tail calls. (TCO Tail Call Optimisation).

This patch overrides convertJmpToTailCall in PPCMCPlusBuilder so that uncoditional branches are converted into their link-bit variants (BL, BLA, BCTRL). This way, BOLT can consistently recognise them as calls/tail calls during dissasembly and optimisation, including when skipping PLT (Procedure Linkage Table) stubs.
…ion.

This patch introduces initial PowerPC relocation handing in createRelocation() by mapping common fixup kinds to ELF relaction types:

*24-bit PC-relative branches (B/BL) -> R_PPC64_REL24
*14-bit conditional branches (BC/BCL) -> R_PPC64_REL14

Unrecognised fixup kinds return std::nullopt.

Additionaly, getPCRelOperandNum() is updated to correctly identify the operand index of branch targets.
This patch adds isIndirectBranch() implementation so that disassemble() can correctly distinguish between direct and indirect branches. This prevents replaceBranchTarget() from being called on instructions without PC-relative operands and avoids assertion failures.
…ranch target isn't at a fixed operand index. Make getPCRelOperandNum() search for last MCOperand that is an expression, instead of hard-coded indices.

Enhance createRelocation() to handle TOC relocations (not just REL24/REL14). To fix the JITLink assert "NOP should be placed here for restoring r2". JITLink coudn't see TOC-related edges and couldn't patch sequences correctly.
I had mapping only for REL24/REL14.
…restore.

Detect calls without a following NOP and inject one during emission to prevent
ppc64 JITLink assertions and preserve TOC (r2) handling.
… the emitter can enter the NOP-insertion block.
For other targets evaluateBranch can index MC operand info and compute target reliably.

Though for PPC64 a bl often arrives without an explicit MC operand when disassembled from an object file. So the evaluateBranch() is called  on an instruction shape it does not handle, leading to an assertion failure.

In relocation mode, BOLT already resolves call targets via relocations,
so calling `evaluateBranch` for PPC calls is unnecessary. Branches are
still handled normally.

This change skips evaluateBranch for PPC calls while preserving branch evaluation for contidional/unconditional jumps.
…tection

Provide a PPC implementation of MCPlusBuilder::evaluateMemOperandTarget()
that returns false (PPC64 with -mno-pcrel has no PC-relative memory operands).
This avoids hitting the base-class llvm_unreachable during disassembly.

Also stop reporting BA/BLA (absolute) as having a PC-relative operand. Keep
PC-rel handling only for BL/B/BC where AA=0.
MCPlusBuilder::evaluateMemOperandTarget(...)
BinaryFunction::handlePCRelOperand(...)
BinaryFunction::disassemble()

BOLT was crashing with:

  FATAL BOLT-ERROR: PC-relative operand can't be evaluated

because `PPCMCPlusBuilder::hasPCRelOperand()` returned `true` for BL/BC,
causing BOLT to try to evaluate the target directly from the MCInst.
On PowerPC ELFv2, branch/call targets are typically resolved via relocations
(e.g. R_PPC64_REL24) rather than embedded PC-rel immediates, so evaluation
fails.

This patch makes `hasPCRelOperand()` return `false` for PPC, and guards
`replaceBranchTarget()` accordingly.
…nsert the NOP and avoid crashing llvm_unreachable.
PowerPC branch instruction encode their displacement in different operand positions that other targets. The previous implementation incorrectly assumed operand #0 for all branches.

This patch:
* Corrects the getPCRelOperandNum() to return the proper operand index. (e.g. operand llvm#2 for 'bc')
* Updates hasPCRelOperand() and evaluateBranch() to use the correct index.
* Clarifies relative vs absolute (AA=0 vs AA=1) branches/calls.

This fixes branch target evaluation and rewriting on PPC64 ELFv2 when BOLT analyszes control-flow.
On PPC64 ELFv2, the instruction slot immediately after a 'bl' is reserved for either a NOP or a TOC-restore sequence ('ld r2', '24(r1)').

I noticed that my latest code was inserting NOP in place of TOC-restore. This led to incorrect execution when the restore TOC value was lost.

I implemented a new method 'isTOCRestoreAfterCall()' to check if TOC-restore is after main lb and to skip adding NOP.
Remove my marking code in disassemble() method. More specifically remove fallback call disassembleInstructionAtOffset(NextOff). I 'peeked'  the next address of bl call to see if it was NOP/TOP-restore but it inserted an additional ld 2, 24(1) into the internal structure.
…changed for PLT TOC restore.

On PPC64 ELFv2, the PLT call linker may restore the TOC with 'ld 2, 24(1)' immediately after a the 'bl'.  I had previously injected an adjustment (with createPushRegisters) into stack-pointer ('r1') between the 'bl' and restore which could cause crash (segmentation fault).

Now I updated the createPushRegisters to return two NOPs ('ori r0, r0, 0')  instead of touching the r1.  Satisfying the generic two-instruction "save" contract expected by callers.
… to detect correctly TOC-restore.

Fix to expect slot1 the IMM=24
and slot2 the BASE reg. They were flipped before.
Normalise PPC64 post-call slot so LLVM JITLink can apply TOC restore.

PPC64 JITLink expect to see NOP after bl main. The static linker inserted already ld after bl call and that triggered an assertion in PPC64 JITLink.
("NOP should be placed here for restoring r2").

Fix: If the instruction immediately after a call is the canonical TOC restore ('ld r2, 24(r1)'), emit a NOP instead.
…DS/LO_DS".Add support for R_PPC64_TOC16_DS and R_PPC64_TOC16_LO_DS.
According to the ELFv2 ABI, the TOC base register (r2) point to the address .got + 0x8000. This patch computes the correct TOC base when processing PPC64 binaries and uses it to anchor TOC-relative relocations.

This resolves warnings such as:
"ignoring symbol .TOC. at 0x10027f00, which lies outside .got"
… *' has incompatible initializer of type 'ErrorOr<BinarySection &>", unwrap ErrorOr<BinarySection&> before using it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants