[AMDGPU] Add machine-level inliner pass #169476

rovka · 2025-11-25T09:58:33Z

Add the necessary infrastructure for the machine-level inliner. The
inliner will initially only handle calls to functions with the
amdgpu_gfx_whole_wave calling convention. Partial inlining is
currently not supported - all whole wave functions will be inlined into
all their call sites and removed from the module (which should be safe
since whole wave functions can't be called indirectly and their address
can't be taken). As a consequence, recursive whole wave functions are
not supported yet (I'll fix that in a separate patch).

In addition to a MachineFunction pass representing the inliner itself,
the patch adds a custom FPPassManager (AMDGPUInliningPassManager)
which helps manage the inlining process. It does this by suspending the
processing of inlined functions when the inliner runs, which means they
will have the correct shape when the inliner runs on their callers.
After the pass pipeline is run on all the functions in the module, the
custom pass manager will finally release the inlined MachineFunctions
(in the future, it's easy to update it to run the remainder of the pass
pipeline on them instead of just deleting them, making it possible to
support partial inlining, and with it recursion). This works because
the backend passes already run inside a call graph pass manager, so the
callees are always processed before the callers.

The custom pass manager is inserted into the pipeline by another pass,
AMDGPUInliningAnchor, whose preparePassManager method will oust any
existing FunctionPass manager and replace it with the inlining pass
manager. This makes it possible to use the custom pass manager without
any other changes to the pass manager infrastructure.

Support for the new pass manager will be part of a different patch.

Finally, make sure the update_mir_test_checks.py script can gracefully handle
functions that have been inlined, by checking for conflicts and generating NOT
checks when a MIR function is missing for all the run lines.

This doesn't do anything interesting yet, but it will enable the machine level inliner when it becomes available.

Add the necessary infrastructure for the machine-level inliner. The inliner will initially only handle calls to functions with the `amdgpu_gfx_whole_wave` calling convention. Partial inlining is currently not supported - all whole wave functions will be inlined into all their call sites and removed from the module (which should be safe since whole wave functions can't be called indirectly and their address can't be taken). As a consequence, recursive whole wave functions are not supported yet (I'll fix that in a separate patch). In addition to a MachineFunction pass representing the inliner itself, the patch adds a custom FPPassManager (`AMDGPUInliningPassManager`) which helps manage the inlining process. It does this by suspending the processing of inlined functions when the inliner runs, which means they will have the correct shape when the inliner runs on their callers. After the pass pipeline is run on all the functions in the module, the custom pass manager will finally release the inlined MachineFunctions (in the future, it's easy to update it to run the remainder of the pass pipeline on them instead of just deleting them, making it possible to support partial inlining, and with it recursion). This works because the backend passes already run inside a call graph pass manager, so the callees are always processed before the callers. The custom pass manager is inserted into the pipeline by another pass, `AMDGPUInliningAnchor`, whose `preparePassManager` method will oust any existing FunctionPass manager and replace it with the inlining pass manager. This makes it possible to use the custom pass manager without any other changes to the pass manager infrastructure. Support for the new pass manager will be part of a different patch.

Errors out if there's a recursive call.

Generate CHECK-NOT for MIR functions that are missing from the output. Also look for conflicts where a MIR function is generated for some runs but not others with the same prefixes.

rovka · 2025-11-25T09:58:48Z

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-11-25T10:00:11Z

✅ With the latest revision this PR passed the Python code formatter.

llvmbot · 2025-11-25T10:11:27Z

@llvm/pr-subscribers-testing-tools

@llvm/pr-subscribers-backend-amdgpu

Author: Diana Picus (rovka)

Changes

Add the necessary infrastructure for the machine-level inliner. The
inliner will initially only handle calls to functions with the
amdgpu_gfx_whole_wave calling convention. Partial inlining is
currently not supported - all whole wave functions will be inlined into
all their call sites and removed from the module (which should be safe
since whole wave functions can't be called indirectly and their address
can't be taken). As a consequence, recursive whole wave functions are
not supported yet (I'll fix that in a separate patch).

In addition to a MachineFunction pass representing the inliner itself,
the patch adds a custom FPPassManager (AMDGPUInliningPassManager)
which helps manage the inlining process. It does this by suspending the
processing of inlined functions when the inliner runs, which means they
will have the correct shape when the inliner runs on their callers.
After the pass pipeline is run on all the functions in the module, the
custom pass manager will finally release the inlined MachineFunctions
(in the future, it's easy to update it to run the remainder of the pass
pipeline on them instead of just deleting them, making it possible to
support partial inlining, and with it recursion). This works because
the backend passes already run inside a call graph pass manager, so the
callees are always processed before the callers.

The custom pass manager is inserted into the pipeline by another pass,
AMDGPUInliningAnchor, whose preparePassManager method will oust any
existing FunctionPass manager and replace it with the inlining pass
manager. This makes it possible to use the custom pass manager without
any other changes to the pass manager infrastructure.

Support for the new pass manager will be part of a different patch.

Finally, make sure the update_mir_test_checks.py script can gracefully handle
functions that have been inlined, by checking for conflicts and generating NOT
checks when a MIR function is missing for all the run lines.

Patch is 1.35 MiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169476.diff

21 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+8)
(added) llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.cpp (+425)
(added) llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.h (+59)
(modified) llvm/lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h (+15)
(modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+11)
(modified) llvm/lib/Target/AMDGPU/CMakeLists.txt (+1)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-machine-level-inliner.ll (+296)
(added) llvm/test/CodeGen/AMDGPU/amdgpu-machine-level-inliner.mir (+348)
(modified) llvm/test/CodeGen/AMDGPU/llc-pipeline.ll (+346)
(added) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu-inlined-function.ll (+20)
(added) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/Inputs/amdgpu-inlined-function.ll.expected (+129)
(added) llvm/test/tools/UpdateTestChecks/update_llc_test_checks/amdgpu-inlined-function.test (+9)
(added) llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function-swapped.mir (+27)
(added) llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function-swapped.mir.expected (+34)
(added) llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function.mir (+27)
(added) llvm/test/tools/UpdateTestChecks/update_mir_test_checks/Inputs/inlined-function.mir.expected (+34)
(added) llvm/test/tools/UpdateTestChecks/update_mir_test_checks/inlined-function-swapped.test (+6)
(added) llvm/test/tools/UpdateTestChecks/update_mir_test_checks/inlined-function.test (+7)
(modified) llvm/utils/UpdateTestChecks/mir.py (+57-8)
(modified) llvm/utils/update_llc_test_checks.py (+2)
(modified) llvm/utils/update_mir_test_checks.py (+2)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 5af2a2755cec3..f9849374c62ba 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -114,6 +114,14 @@ FunctionPass *createAMDGPULowerKernelArgumentsPass();
 void initializeAMDGPULowerKernelArgumentsPass(PassRegistry &);
 extern char &AMDGPULowerKernelArgumentsID;
 
+FunctionPass *createAMDGPUMachineLevelInlinerPass();
+void initializeAMDGPUMachineLevelInlinerPass(PassRegistry &);
+extern char &AMDGPUMachineLevelInlinerID;
+
+FunctionPass *createAMDGPUInliningAnchorPass();
+void initializeAMDGPUInliningAnchorPass(PassRegistry &);
+extern char &AMDGPUInliningAnchorID;
+
 FunctionPass *createAMDGPUPromoteKernelArgumentsPass();
 void initializeAMDGPUPromoteKernelArgumentsPass(PassRegistry &);
 extern char &AMDGPUPromoteKernelArgumentsID;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.cpp b/llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.cpp
new file mode 100644
index 0000000000000..8a586ddbfdfa5
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.cpp
@@ -0,0 +1,425 @@
+//===-- AMDGPUMachineLevelInliner.cpp - AMDGPU Machine Level Inliner ----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "AMDGPUMachineLevelInliner.h"
+#include "AMDGPU.h"
+#include "AMDGPUMachineModuleInfo.h"
+#include "AMDGPUSubtarget.h"
+#include "SIInstrInfo.h"
+#include "SIMachineFunctionInfo.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/CodeGen/MachineFrameInfo.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineModuleInfo.h"
+#include "llvm/IR/CallingConv.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/IR/LegacyPassManagers.h"
+#include "llvm/IR/Module.h"
+#include "llvm/IR/PassTimingInfo.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/Debug.h"
+#include "llvm/Support/TimeProfiler.h"
+#include "llvm/Support/raw_ostream.h"
+
+using namespace llvm;
+
+#define DEBUG_TYPE "amdgpu-machine-level-inliner"
+
+namespace {
+class AMDGPUInliningPassManager : public FPPassManager {
+public:
+  static char ID;
+
+  explicit AMDGPUInliningPassManager() : FPPassManager(ID) {}
+
+  bool runOnFunction(Function &F) override;
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+  bool doFinalization(Module &M) override;
+
+  StringRef getPassName() const override {
+    return "AMDGPU Inlining Pass Manager";
+  }
+};
+
+/// AMDGPUInliningAnchor - A machine function pass that serves as an anchor for
+/// setting up the AMDGPU inlining pass manager infrastructure. It makes sure
+/// the inliner is run via an AMDGPUInliningPassManager. It can be run well in
+/// advance of the inliner as long as there are only FunctionPasses in between.
+class AMDGPUInliningAnchor : public MachineFunctionPass {
+public:
+  static char ID; // Pass identification
+
+  AMDGPUInliningAnchor() : MachineFunctionPass(ID) {}
+
+  // We don't really need to process any functions here.
+  bool runOnMachineFunction(MachineFunction &MF) override { return false; }
+
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+  StringRef getPassName() const override;
+
+  /// Prepare the pass manager stack for the inliner. This will push an
+  /// `AMDGPUInliningPassManager` onto the stack.
+  void preparePassManager(PMStack &Stack) override;
+};
+
+} // end anonymous namespace.
+
+// Pass identification
+char AMDGPUMachineLevelInliner::ID = 0;
+char AMDGPUInliningPassManager::ID = 0;
+char AMDGPUInliningAnchor::ID = 0;
+
+char &llvm::AMDGPUMachineLevelInlinerID = AMDGPUMachineLevelInliner::ID;
+char &llvm::AMDGPUInliningAnchorID = AMDGPUInliningAnchor::ID;
+
+INITIALIZE_PASS_BEGIN(AMDGPUMachineLevelInliner, DEBUG_TYPE,
+                      "AMDGPU Machine Level Inliner", false, false)
+INITIALIZE_PASS_DEPENDENCY(MachineModuleInfoWrapperPass)
+INITIALIZE_PASS_DEPENDENCY(AMDGPUInliningAnchor)
+INITIALIZE_PASS_END(AMDGPUMachineLevelInliner, DEBUG_TYPE,
+                    "AMDGPU Machine Level Inliner", false, false)
+
+INITIALIZE_PASS_BEGIN(AMDGPUInliningAnchor, "amdgpu-inlining-anchor",
+                      "AMDGPU Inlining Anchor", false, true)
+INITIALIZE_PASS_DEPENDENCY(MachineModuleInfoWrapperPass)
+INITIALIZE_PASS_END(AMDGPUInliningAnchor, "amdgpu-inlining-anchor",
+                    "AMDGPU Inlining Anchor", false, true)
+
+AMDGPUMachineLevelInliner::AMDGPUMachineLevelInliner()
+    : MachineFunctionPass(ID) {
+  initializeAMDGPUMachineLevelInlinerPass(*PassRegistry::getPassRegistry());
+}
+
+void AMDGPUMachineLevelInliner::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.addRequired<MachineModuleInfoWrapperPass>();
+  AU.addRequired<AMDGPUInliningAnchor>();
+  AU.addPreserved<MachineModuleInfoWrapperPass>();
+  MachineFunctionPass::getAnalysisUsage(AU);
+}
+
+bool AMDGPUMachineLevelInliner::runOnMachineFunction(MachineFunction &MF) {
+  MachineModuleInfo &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
+
+  Function &F = MF.getFunction();
+  if (shouldInlineCallsTo(F)) {
+    // Mark the function as machine-inlined in AMDGPUMachineModuleInfo. This
+    // tells the inlining pass manager to stop processing it.
+    auto &AMMMI = MMI.getObjFileInfo<AMDGPUMachineModuleInfo>();
+    AMMMI.addMachineInlinedFunction(F);
+
+    return false;
+  }
+
+  bool Changed = false;
+
+  // Can't inline anything if there aren't any calls.
+  MachineFrameInfo &MFI = MF.getFrameInfo();
+  if (!MFI.hasCalls() && !MFI.hasTailCall())
+    return false;
+
+  // Collect calls to inline.
+  SmallVector<MachineInstr *, 4> CallsToInline;
+  const SIInstrInfo *TII = MF.getSubtarget<GCNSubtarget>().getInstrInfo();
+
+  for (auto &MBB : MF) {
+    for (auto &MI : MBB) {
+      if (!MI.isCall())
+        continue;
+
+      const MachineOperand *CalleeOp =
+          TII->getNamedOperand(MI, AMDGPU::OpName::callee);
+      if (CalleeOp && CalleeOp->isGlobal()) {
+        if (auto *CalledFunc = dyn_cast<Function>(CalleeOp->getGlobal())) {
+          // Partial inlining is not supported yet, because the inlining pass
+          // manager does not run the rest of the pass pipeline on functions
+          // that get inlined (including outputting code for them).
+          if (CalledFunc == &F)
+            report_fatal_error("Recursive calls in whole wave functions are "
+                               "not supported yet");
+
+          if (shouldInlineCallsTo(*CalledFunc)) {
+            CallsToInline.push_back(&MI);
+          }
+        }
+      }
+    }
+  }
+
+  // Perform the actual inlining.
+  for (MachineInstr *CallMI : CallsToInline) {
+    const MachineOperand *CalleeOp =
+        TII->getNamedOperand(*CallMI, AMDGPU::OpName::callee);
+    assert(CalleeOp && CalleeOp->isGlobal() &&
+           isa<Function>(CalleeOp->getGlobal()));
+    auto *Callee = cast<Function>(CalleeOp->getGlobal());
+
+    MachineFunction *CalleeMF = MMI.getMachineFunction(*Callee);
+    assert(CalleeMF && "Couldn't get MachineFunction for callee");
+    assert(!CalleeMF->empty() && "Machine function body is empty");
+
+    LLVM_DEBUG(dbgs() << "    Inlining machine call to: " << Callee->getName()
+                      << " (" << CalleeMF->size() << " basic blocks)\n");
+
+    inlineMachineFunction(&MF, CallMI, CalleeMF, TII);
+    cleanupAfterInlining(&MF, CallMI, TII);
+    Changed = true;
+  }
+
+  return Changed;
+}
+
+void AMDGPUMachineLevelInliner::inlineMachineFunction(MachineFunction *CallerMF,
+                                                      MachineInstr *CallMI,
+                                                      MachineFunction *CalleeMF,
+                                                      const SIInstrInfo *TII) {
+
+  MachineBasicBlock *CallMBB = CallMI->getParent();
+  MachineBasicBlock *ContinuationMBB =
+      CallMBB->splitAt(*CallMI, /*UpdateLiveIns=*/true);
+
+  // Splitting marks the ContinuationMBB as a successor, but we want to
+  // fallthrough to the body of the inlined function instead.
+  CallMBB->removeSuccessor(ContinuationMBB);
+
+  // First we clone all the blocks and build a map, so we can patch up the
+  // control flow while cloning their content in a second pass.
+  DenseMap<const MachineBasicBlock *, MachineBasicBlock *> ClonedBlocks;
+  for (const MachineBasicBlock &OrigMBB : *CalleeMF) {
+    MachineBasicBlock *ClonedMBB =
+        CallerMF->CreateMachineBasicBlock(OrigMBB.getBasicBlock());
+    CallerMF->insert(ContinuationMBB->getIterator(), ClonedMBB);
+    ClonedBlocks[&OrigMBB] = ClonedMBB;
+  }
+
+  MachineBasicBlock *ClonedEntry = ClonedBlocks[&CalleeMF->front()];
+  CallMBB->addSuccessor(ClonedEntry);
+
+  for (const MachineBasicBlock &OrigMBB : *CalleeMF) {
+    MachineBasicBlock *ClonedMBB = ClonedBlocks[&OrigMBB];
+
+    for (MachineBasicBlock *OrigSucc : OrigMBB.successors())
+      ClonedMBB->addSuccessor(ClonedBlocks[OrigSucc]);
+
+    for (auto &LiveIn : OrigMBB.liveins())
+      ClonedMBB->addLiveIn(LiveIn);
+
+    for (const MachineInstr &OrigMI : OrigMBB) {
+      // Bundled instructions are handled by the bundle header.
+      if (OrigMI.isBundledWithPred())
+        continue;
+
+      if (OrigMI.isReturn()) {
+        assert(!OrigMI.isCall() && "Tail calls not supported yet"); // FIXME
+        TII->insertBranch(*ClonedMBB, ContinuationMBB, nullptr,
+                          SmallVector<MachineOperand, 0>(), DebugLoc());
+        ClonedMBB->addSuccessor(ContinuationMBB);
+      } else {
+        MachineInstr &ClonedMI = CallerMF->cloneMachineInstrBundle(
+            *ClonedMBB, ClonedMBB->end(), OrigMI);
+        ClonedMI.dropMemRefs(*CallerMF); // FIXME: Update them instead.
+
+        for (MachineOperand &MO : ClonedMI.operands())
+          if (MO.isMBB())
+            MO.setMBB(ClonedBlocks[MO.getMBB()]);
+      }
+    }
+  }
+}
+
+void AMDGPUMachineLevelInliner::cleanupAfterInlining(
+    MachineFunction *CallerMF, MachineInstr *CallMI,
+    const SIInstrInfo *TII) const {
+  MachineRegisterInfo &MRI = CallerMF->getRegInfo();
+  const TargetRegisterInfo *TRI = CallerMF->getSubtarget().getRegisterInfo();
+
+  // Clean up instructions setting up the callee operand (this is important
+  // because we won't be generating any code for that symbol, so we don't want
+  // references to it dangling around).
+  const MachineOperand *CalleeGlobalOp =
+      TII->getNamedOperand(*CallMI, AMDGPU::OpName::callee);
+  const MachineOperand *CalleeRegOp =
+      TII->getNamedOperand(*CallMI, AMDGPU::OpName::src0);
+
+  assert(CalleeGlobalOp && CalleeRegOp &&
+         "Couldn't get operands for call inst");
+  assert(CalleeGlobalOp->isGlobal() && "Unexpected operand kind");
+  assert(CalleeRegOp->isReg() && "Unexpected operand kind");
+
+  const GlobalValue *CalleeGV = CalleeGlobalOp->getGlobal();
+  Register CalleeReg = CalleeRegOp->getReg();
+
+  SmallVector<MachineInstr *, 4> ToErase;
+  ToErase.push_back(CallMI);
+
+  // Check each subreg of the callee register (e.g., s0 and s1 for s[0:1]).
+  for (MCSubRegIterator SR(CalleeReg, TRI, /*IncludeSelf=*/true); SR.isValid();
+       ++SR) {
+    MCPhysReg SubReg = *SR;
+
+    // Usually the instructions setting up the callee are a S_MOV_B32
+    // referencing the global op. Look for them and remove them. In the general
+    // case, we'll want to check that these instructions have no other uses, but
+    // for now this should be safe because the addresses of whole wave functions
+    // may not be used for anything other than direct calls.
+    for (MachineInstr &DefMI : MRI.def_instructions(SubReg)) {
+      // Check if this def instruction references the callee global
+      for (const MachineOperand &MO : DefMI.operands()) {
+        if (MO.isGlobal() && MO.getGlobal() == CalleeGV) {
+          ToErase.push_back(&DefMI);
+          break;
+        }
+      }
+    }
+  }
+
+  for (MachineInstr *MI : ToErase)
+    MI->eraseFromParent();
+}
+
+FunctionPass *llvm::createAMDGPUMachineLevelInlinerPass() {
+  return new AMDGPUMachineLevelInliner();
+}
+
+// The implementation here follows FPPassManager::runOnFunction but with some
+// simplifications since we know we're not running this on LLVM IR (so the
+// Function itself will never be changed, only its corresponding
+// MachineFunction). It also checks after every pass if the function has been
+// inlined, and stops running passes on it if that's the case.
+bool AMDGPUInliningPassManager::runOnFunction(Function &F) {
+  if (F.isDeclaration())
+    return false;
+
+  MachineModuleInfo &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
+  auto &AMMMI = MMI.getObjFileInfo<AMDGPUMachineModuleInfo>();
+
+  // Don't run anything on functions that have already been inlined.
+  if (AMMMI.isMachineInlinedFunction(F))
+    return false;
+
+  bool Changed = false;
+  populateInheritedAnalysis(TPM->activeStack);
+
+  // Store name outside of loop to avoid redundant calls.
+  const StringRef Name = F.getName();
+  llvm::TimeTraceScope FunctionScope("OptFunction", Name);
+
+  for (Pass *P : PassVector) {
+    FunctionPass *FP = static_cast<FunctionPass *>(P);
+    bool LocalChanged = false;
+
+    // Call getPassName only when required. The call itself is fairly cheap, but
+    // still virtual and repeated calling adds unnecessary overhead.
+    llvm::TimeTraceScope PassScope(
+        "RunPass", [FP]() { return std::string(FP->getPassName()); });
+
+    dumpPassInfo(FP, EXECUTION_MSG, ON_FUNCTION_MSG, Name);
+    dumpRequiredSet(FP);
+
+    initializeAnalysisImpl(FP);
+
+    {
+      PassManagerPrettyStackEntry X(FP, F);
+      TimeRegion PassTimer(getPassTimer(FP));
+
+      LocalChanged |= FP->runOnFunction(F);
+    }
+
+    Changed |= LocalChanged;
+    if (LocalChanged)
+      dumpPassInfo(FP, MODIFICATION_MSG, ON_FUNCTION_MSG, Name);
+    dumpPreservedSet(FP);
+    dumpUsedSet(FP);
+
+    // If the pass has marked the function for inlining, skip remaining passes.
+    if (AMMMI.isMachineInlinedFunction(F))
+      break;
+
+    verifyPreservedAnalysis(FP);
+    if (LocalChanged)
+      removeNotPreservedAnalysis(FP);
+    recordAvailableAnalysis(FP);
+    removeDeadPasses(FP, Name, ON_FUNCTION_MSG);
+  }
+
+  return Changed;
+}
+
+bool AMDGPUInliningPassManager::doFinalization(Module &M) {
+  MachineModuleInfo &MMI = getAnalysis<MachineModuleInfoWrapperPass>().getMMI();
+  auto &AMMMI = MMI.getObjFileInfo<AMDGPUMachineModuleInfo>();
+
+  // Free MachineFunction for all inlined functions. Other machine functions are
+  // being freed via the FreeMachineFunction pass which runs at the end of
+  // the pass pipeline.
+  // TODO: This is a good place to run the rest of the pass pipeline for
+  // functions that have been only partially inlined and which still need to be
+  // emitted. This way they can be in their inlining-ready form until we're done
+  // processing all their callers, and then still go through the rest of the
+  // pipeline.
+  for (Function *F : AMMMI.getMachineInlinedFunctions())
+    MMI.deleteMachineFunctionFor(*F);
+
+  return FPPassManager::doFinalization(M);
+}
+
+void AMDGPUInliningPassManager::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.addRequired<MachineModuleInfoWrapperPass>();
+  AU.addPreserved<MachineModuleInfoWrapperPass>();
+  ModulePass::getAnalysisUsage(AU);
+}
+
+FunctionPass *llvm::createAMDGPUInliningAnchorPass() {
+  return new AMDGPUInliningAnchor();
+}
+
+void AMDGPUInliningAnchor::getAnalysisUsage(AnalysisUsage &AU) const {
+  AU.addRequired<MachineModuleInfoWrapperPass>();
+  AU.setPreservesAll();
+}
+
+StringRef AMDGPUInliningAnchor::getPassName() const {
+  return "AMDGPU Inlining Anchor";
+}
+
+void AMDGPUInliningAnchor::preparePassManager(PMStack &PMS) {
+  // Replace the top FunctionPass manager (if there is one) with an
+  // AMDGPUInliningPassManager.
+  while (!PMS.empty() &&
+         PMS.top()->getPassManagerType() > PMT_FunctionPassManager)
+    PMS.pop();
+
+  assert(!PMS.empty() && "Unable to create AMDGPU Inlining Pass Manager");
+  PMDataManager *PMD = PMS.top();
+
+  // Nothing to do if it's already an AMDGPUInliningPassManager.
+  if (PMD->getAsPass()->getPassID() == &AMDGPUInliningPassManager::ID)
+    return;
+
+  // If we have a different FunctionPass manager, pop it.
+  if (PMD->getPassManagerType() == PMT_FunctionPassManager) {
+    PMS.pop();
+    PMD = PMS.top();
+  }
+
+  // Create and push our custom AMDGPUInliningPassManager.
+  auto *PM = new AMDGPUInliningPassManager();
+  PM->populateInheritedAnalysis(PMS);
+
+  PMTopLevelManager *TPM = PMD->getTopLevelManager();
+  TPM->addIndirectPassManager(PM);
+
+  PM->assignPassManager(PMS, PMD->getPassManagerType());
+
+  PMS.push(PM);
+}
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.h b/llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.h
new file mode 100644
index 0000000000000..ab5ecdc5dbd41
--- /dev/null
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMachineLevelInliner.h
@@ -0,0 +1,59 @@
+//===-- AMDGPUMachineLevelInliner.h - AMDGPU Machine Level Inliner -*- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file defines the AMDGPUMachineLevelInliner pass, which performs
+// machine-level inlining for AMDGPU targets.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINELEVELINLINER_H
+#define LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINELEVELINLINER_H
+
+#include "llvm/ADT/SmallSet.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/Analysis/CallGraphSCCPass.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineFunctionPass.h"
+#include "llvm/IR/Function.h"
+#include "llvm/IR/Instructions.h"
+#include "llvm/Pass.h"
+
+namespace llvm {
+
+class SIInstrInfo;
+
+class AMDGPUMachineLevelInliner : public MachineFunctionPass {
+public:
+  static char ID; // Pass identification
+
+  AMDGPUMachineLevelInliner();
+
+  bool runOnMachineFunction(MachineFunction &MF) override;
+  void getAnalysisUsage(AnalysisUsage &AU) const override;
+
+  StringRef getPassName() const override {
+    return "AMDGPU Machine Level Inliner";
+  }
+
+private:
+  bool shouldInlineCallsTo(const Function &Callee) {
+    return Callee.getCallingConv() == CallingConv::AMDGPU_Gfx_WholeWave;
+  }
+
+  void inlineMachineFunction(MachineFunction *CallerMF, MachineInstr *CallMI,
+                             MachineFunction *CalleeMF, const SIInstrInfo *TII);
+
+  void cleanupAfterInlining(MachineFunction *CallerMF, MachineInstr *CallMI,
+                            const SIInstrInfo *TII) const;
+};
+
+} // end namespace llvm
+
+#endif // LLVM_LIB_TARGET_AMDGPU_AMDGPUMACHINELEVELINLINER_H
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h b/llvm/lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h
index bf852bb38376e..f09c99f0f52ec 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPUMachineModuleInfo.h
@@ -88,6 +88,8 @@ class AMDGPUMachineModuleInfo final : public MachineModuleInfoELF {
            SSID == getSystemOneAddressSpaceSSID();
   }
 
+  SmallVector<Function *, 8> MachineInlinedFunctions;
+
 public:
   AMDGPUMachineModuleInfo(const MachineModuleInfo &MMI);
 
@@ -151,6 +153,19 @@ class AMDGPUMachineModuleInfo final : public MachineModuleInfoELF {
     return *AIO >= *BIO &&
            (IsAOneAddressSpace == IsBOneAddressSpace || !IsAOneAddressSpace);
   }
+
+  void addMachineInlinedFunction(Function &F) {
+    MachineInlinedFunctions.push_back(&F);
+  }
+
+  bool isMachineInlinedFunction(Function &F) const {
+    return llvm::find(MachineInlinedFunctions, &F) !=
+           MachineInlinedFunctions.end();
+  }
+
+  ArrayRef<Function *> getMachineInlinedFunctions() const {
+    return MachineInlinedFunctions;
+  }
 };
 
 } // end namespace llvm
d...
[truncated]

github-actions · 2025-11-25T10:40:55Z

🐧 Linux x64 Test Results

186559 tests passed
4888 tests skipped

rovka added 4 commits November 25, 2025 10:40

[AMDGPU] Precommit pass pipeline test for MIR inliner

df28fe2

This doesn't do anything interesting yet, but it will enable the machine level inliner when it becomes available.

[AMDGPU] Actually perform the machine level inlining

64c9485

Errors out if there's a recursive call.

Support inlining in backend update scripts

6e49828

Generate CHECK-NOT for MIR functions that are missing from the output. Also look for conflicts where a MIR function is generated for some runs but not others with the same prefixes.

This was referenced Nov 25, 2025

Make legacy FPPassManager more inheritable #169475

Open

[AMDGPU] Update machine frame info during inlining #169477

Open

[AMDGPU] Insert inliner anchor earlier #169478

Open

rovka changed the title ~~[AMDGPU] Precommit pass pipeline test for MIR inliner~~ [AMDGPU] Add machine-level inliner pass Nov 25, 2025

python formatting, please ignore

7abe483

rovka marked this pull request as ready for review November 25, 2025 10:10

llvmbot added backend:AMDGPU testing-tools labels Nov 25, 2025

rovka requested review from arsenm, cmc-rep, jayfoad, nhaehnle and piotrAMD November 25, 2025 10:11

Fixup test that I somehow missed

c4b8de2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU] Add machine-level inliner pass #169476

[AMDGPU] Add machine-level inliner pass #169476

rovka commented Nov 25, 2025 •

edited

Loading

Uh oh!

rovka commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

llvmbot commented Nov 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 25, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[AMDGPU] Add machine-level inliner pass #169476

Are you sure you want to change the base?

[AMDGPU] Add machine-level inliner pass #169476

Conversation

rovka commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rovka commented Nov 25, 2025

Uh oh!

github-actions bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🐧 Linux x64 Test Results

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rovka commented Nov 25, 2025 •

edited

Loading

github-actions bot commented Nov 25, 2025 •

edited

Loading

llvmbot commented Nov 25, 2025 •

edited

Loading

github-actions bot commented Nov 25, 2025 •

edited

Loading