[MachineSSAUpdater][AMDGPU] Add faster version of MachineSSAUpdater class. #145722

vpykhtin · 2025-06-25T15:38:52Z

This is a port of SSAUpdaterBulk to machine IR minus "bulk" part. Phi deduplication and simplification are not yet implemented but can be added if needed.

When used in AMDGPU to replace MachineSSAUpdater for i1 copy lowering, it reduced compilation time from 417 to 180 seconds for the pass on a large test case (56% improvement).

llvmbot · 2025-06-25T15:39:21Z

@llvm/pr-subscribers-backend-amdgpu

@llvm/pr-subscribers-llvm-globalisel

Author: Valery Pykhtin (vpykhtin)

Changes

This is a port of SSAUpdaterBulk to machine IR minus "bulk" part. Phi deduplication and simplification are not yet implemented but can be added if needed.

When used in AMDGPU to replace MachineSSAUpdater for i1 copy lowering, it reduced compilation time from 417 to 180 seconds for the pass on a large test case (56% improvement).

Patch is 20.16 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/145722.diff

7 Files Affected:

(added) llvm/include/llvm/CodeGen/MachineSSAUpdater2.h (+78)
(modified) llvm/lib/CodeGen/CMakeLists.txt (+1)
(added) llvm/lib/CodeGen/MachineSSAUpdater2.cpp (+185)
(modified) llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp (+13-5)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir (+11-11)
(modified) llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir (+3-3)
(modified) llvm/test/CodeGen/AMDGPU/si-lower-i1-copies-order-of-phi-incomings.mir (+3-3)

diff --git a/llvm/include/llvm/CodeGen/MachineSSAUpdater2.h b/llvm/include/llvm/CodeGen/MachineSSAUpdater2.h
new file mode 100644
index 0000000000000..e3a5c45477ca3
--- /dev/null
+++ b/llvm/include/llvm/CodeGen/MachineSSAUpdater2.h
@@ -0,0 +1,78 @@
+//===- MachineSSAUpdaterBulk.h - Unstructured SSA Update Tool ----------*- C++
+//-*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file declares the MachineSSAUpdater2 class.
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_UTILS_MACHINE_SSAUPDATER2_H
+#define LLVM_TRANSFORMS_UTILS_MACHINE_SSAUPDATER2_H
+
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+
+namespace llvm {
+
+class MachineDominatorTree;
+class MachineInstrBuilder;
+class MachineBasicBlock;
+
+class MachineSSAUpdater2 {
+  struct BBValueInfo {
+    Register LiveInValue;
+    Register LiveOutValue;
+  };
+
+  MachineDominatorTree &DT;
+  MachineRegisterInfo &MRI;
+  const TargetInstrInfo &TII;
+  MachineRegisterInfo::VRegAttrs RegAttrs;
+
+  SmallVector<std::pair<MachineBasicBlock *, Register>, 4> Defines;
+  SmallVector<MachineBasicBlock *, 4> UseBlocks;
+  DenseMap<MachineBasicBlock *, BBValueInfo> BBInfos;
+
+  MachineInstrBuilder CreateInst(unsigned Opc, MachineBasicBlock *BB,
+                                 MachineBasicBlock::iterator I);
+
+  // IsLiveOut indicates whether we are computing live-out values (true) or
+  // live-in values (false).
+  Register ComputeValue(MachineBasicBlock *BB, bool IsLiveOut);
+
+public:
+  MachineSSAUpdater2(MachineDominatorTree &DT, MachineFunction &MF,
+                     const MachineRegisterInfo::VRegAttrs &RegAttr)
+      : DT(DT), MRI(MF.getRegInfo()), TII(*MF.getSubtarget().getInstrInfo()),
+        RegAttrs(RegAttr) {}
+
+  MachineSSAUpdater2(MachineDominatorTree &DT, MachineFunction &MF,
+                     Register Reg)
+      : MachineSSAUpdater2(DT, MF, MF.getRegInfo().getVRegAttrs(Reg)) {}
+
+  /// Indicate that a rewritten value is available in the specified block
+  /// with the specified value. Must be called before invoking Calculate().
+  void AddAvailableValue(MachineBasicBlock *BB, Register V) {
+    Defines.emplace_back(BB, V);
+  }
+
+  /// Record a basic block that uses the value. This method should be called for
+  /// every basic block where the value will be used. Must be called before
+  /// invoking Calculate().
+  void AddUseBlock(MachineBasicBlock *BB) { UseBlocks.push_back(BB); }
+
+  /// Calculate and insert necessary PHI nodes for SSA form.
+  /// Must be called after registering all definitions and uses.
+  void Calculate();
+
+  /// See SSAUpdater::GetValueInMiddleOfBlock description.
+  Register GetValueInMiddleOfBlock(MachineBasicBlock *BB);
+};
+
+} // end namespace llvm
+
+#endif // LLVM_TRANSFORMS_UTILS_MACHINE_SSAUPDATER2_H
diff --git a/llvm/lib/CodeGen/CMakeLists.txt b/llvm/lib/CodeGen/CMakeLists.txt
index f8f9bbba53e43..92912dae1c497 100644
--- a/llvm/lib/CodeGen/CMakeLists.txt
+++ b/llvm/lib/CodeGen/CMakeLists.txt
@@ -148,6 +148,7 @@ add_llvm_component_library(LLVMCodeGen
   MachineSizeOpts.cpp
   MachineSSAContext.cpp
   MachineSSAUpdater.cpp
+  MachineSSAUpdater2.cpp
   MachineStripDebug.cpp
   MachineTraceMetrics.cpp
   MachineUniformityAnalysis.cpp
diff --git a/llvm/lib/CodeGen/MachineSSAUpdater2.cpp b/llvm/lib/CodeGen/MachineSSAUpdater2.cpp
new file mode 100644
index 0000000000000..3a3652627f6ec
--- /dev/null
+++ b/llvm/lib/CodeGen/MachineSSAUpdater2.cpp
@@ -0,0 +1,185 @@
+//===- MachineSSAUpdater2.cpp - Unstructured SSA Update Tool
+//------------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements the MachineSSAUpdater2 class.
+//
+//===----------------------------------------------------------------------===//
+
+#include "llvm/CodeGen/MachineSSAUpdater2.h"
+#include "llvm/ADT/DenseMap.h"
+#include "llvm/Analysis/IteratedDominanceFrontier.h"
+#include "llvm/CodeGen/MachineBasicBlock.h"
+#include "llvm/CodeGen/MachineDominators.h"
+#include "llvm/CodeGen/MachineFunction.h"
+#include "llvm/CodeGen/MachineInstr.h"
+#include "llvm/CodeGen/MachineInstrBuilder.h"
+#include "llvm/CodeGen/MachineOperand.h"
+#include "llvm/CodeGen/MachineRegisterInfo.h"
+#include "llvm/CodeGen/TargetInstrInfo.h"
+#include "llvm/CodeGen/TargetOpcodes.h"
+#include "llvm/IR/DebugLoc.h"
+#include "llvm/Support/Debug.h"
+
+namespace llvm {
+
+template <bool IsPostDom>
+class MachineIDFCalculator final
+    : public IDFCalculatorBase<MachineBasicBlock, IsPostDom> {
+public:
+  using IDFCalculatorBase =
+      typename llvm::IDFCalculatorBase<MachineBasicBlock, IsPostDom>;
+  using ChildrenGetterTy = typename IDFCalculatorBase::ChildrenGetterTy;
+
+  MachineIDFCalculator(DominatorTreeBase<MachineBasicBlock, IsPostDom> &DT)
+      : IDFCalculatorBase(DT) {}
+};
+
+using MachineForwardIDFCalculator = MachineIDFCalculator<false>;
+using MachineReverseIDFCalculator = MachineIDFCalculator<true>;
+
+} // namespace llvm
+
+using namespace llvm;
+
+/// Given sets of UsingBlocks and DefBlocks, compute the set of LiveInBlocks.
+/// This is basically a subgraph limited by DefBlocks and UsingBlocks.
+static void
+ComputeLiveInBlocks(const SmallPtrSetImpl<MachineBasicBlock *> &UsingBlocks,
+                    const SmallPtrSetImpl<MachineBasicBlock *> &DefBlocks,
+                    SmallPtrSetImpl<MachineBasicBlock *> &LiveInBlocks) {
+  // To determine liveness, we must iterate through the predecessors of blocks
+  // where the def is live.  Blocks are added to the worklist if we need to
+  // check their predecessors.  Start with all the using blocks.
+  SmallVector<MachineBasicBlock *, 64> LiveInBlockWorklist(UsingBlocks.begin(),
+                                                           UsingBlocks.end());
+
+  // Now that we have a set of blocks where the phi is live-in, recursively add
+  // their predecessors until we find the full region the value is live.
+  while (!LiveInBlockWorklist.empty()) {
+    MachineBasicBlock *BB = LiveInBlockWorklist.pop_back_val();
+
+    // The block really is live in here, insert it into the set.  If already in
+    // the set, then it has already been processed.
+    if (!LiveInBlocks.insert(BB).second)
+      continue;
+
+    // Since the value is live into BB, it is either defined in a predecessor or
+    // live into it to.  Add the preds to the worklist unless they are a
+    // defining block.
+    for (MachineBasicBlock *P : BB->predecessors()) {
+      // The value is not live into a predecessor if it defines the value.
+      if (DefBlocks.count(P))
+        continue;
+
+      // Otherwise it is, add to the worklist.
+      LiveInBlockWorklist.push_back(P);
+    }
+  }
+}
+
+MachineInstrBuilder
+MachineSSAUpdater2::CreateInst(unsigned Opc, MachineBasicBlock *BB,
+                               MachineBasicBlock::iterator I) {
+  return BuildMI(*BB, I, DebugLoc(), TII.get(Opc),
+                 MRI.createVirtualRegister(RegAttrs));
+}
+
+// IsLiveOut indicates whether we are computing live-out values (true) or
+// live-in values (false).
+Register MachineSSAUpdater2::ComputeValue(MachineBasicBlock *BB,
+                                          bool IsLiveOut) {
+  auto *BBInfo = &BBInfos[BB];
+
+  if (IsLiveOut && BBInfo->LiveOutValue)
+    return BBInfo->LiveOutValue;
+
+  if (BBInfo->LiveInValue)
+    return BBInfo->LiveInValue;
+
+  SmallVector<BBValueInfo *, 4> Stack = {BBInfo};
+  MachineBasicBlock *DomBB = BB;
+  Register V;
+
+  while (DT.isReachableFromEntry(DomBB) && !DomBB->pred_empty() &&
+         (DomBB = DT.getNode(DomBB)->getIDom()->getBlock())) {
+    BBInfo = &BBInfos[DomBB];
+    if (BBInfo->LiveOutValue) {
+      V = BBInfo->LiveOutValue;
+      break;
+    }
+    if (BBInfo->LiveInValue) {
+      V = BBInfo->LiveInValue;
+      break;
+    }
+    Stack.emplace_back(BBInfo);
+  }
+
+  for (auto *BBInfo : Stack)
+    // Loop above can insert new entries into the BBInfos map: assume the
+    // map shouldn't grow due to [1] and BBInfo references are valid.
+    BBInfo->LiveInValue = V;
+
+  if (!V) {
+    V = CreateInst(TargetOpcode::IMPLICIT_DEF, BB,
+                   IsLiveOut ? BB->getFirstTerminator() : BB->getFirstNonPHI())
+            .getReg(0);
+    if (IsLiveOut)
+      BBInfos[BB].LiveOutValue = V;
+    else
+      BBInfos[BB].LiveInValue = V;
+  }
+
+  return V;
+}
+
+/// Perform all the necessary updates, including new PHI-nodes insertion and the
+/// requested uses update.
+void MachineSSAUpdater2::Calculate() {
+  MachineForwardIDFCalculator IDF(DT);
+
+  SmallPtrSet<MachineBasicBlock *, 2> DefBlocks;
+  for (auto [BB, V] : Defines)
+    DefBlocks.insert(BB);
+  IDF.setDefiningBlocks(DefBlocks);
+
+  SmallPtrSet<MachineBasicBlock *, 2> UsingBlocks;
+  for (MachineBasicBlock *BB : UseBlocks)
+    UsingBlocks.insert(BB);
+
+  SmallVector<MachineBasicBlock *, 32> IDFBlocks;
+  SmallPtrSet<MachineBasicBlock *, 32> LiveInBlocks;
+  ComputeLiveInBlocks(UsingBlocks, DefBlocks, LiveInBlocks);
+  IDF.setLiveInBlocks(LiveInBlocks);
+  IDF.calculate(IDFBlocks);
+
+  // Reserve sufficient buckets to prevent map growth. [1]
+  BBInfos.reserve(LiveInBlocks.size() + DefBlocks.size());
+
+  for (auto [BB, V] : Defines)
+    BBInfos[BB].LiveOutValue = V;
+
+  for (auto *FrontierBB : IDFBlocks) {
+    Register NewVR =
+        CreateInst(TargetOpcode::PHI, FrontierBB, FrontierBB->begin())
+            .getReg(0);
+    BBInfos[FrontierBB].LiveInValue = NewVR;
+  }
+
+  for (auto *BB : IDFBlocks) {
+    auto *PHI = &BB->front();
+    assert(PHI->isPHI());
+    MachineInstrBuilder MIB(*BB->getParent(), PHI);
+    for (MachineBasicBlock *Pred : BB->predecessors())
+      MIB.addReg(ComputeValue(Pred, /*IsLiveOut=*/true)).addMBB(Pred);
+  }
+}
+
+Register MachineSSAUpdater2::GetValueInMiddleOfBlock(MachineBasicBlock *BB) {
+  return ComputeValue(BB, /*IsLiveOut=*/false);
+}
diff --git a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
index 96131bd591a17..552669f88cd92 100644
--- a/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
+++ b/llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp
@@ -24,6 +24,7 @@
 #include "SILowerI1Copies.h"
 #include "AMDGPU.h"
 #include "llvm/CodeGen/MachineSSAUpdater.h"
+#include "llvm/CodeGen/MachineSSAUpdater2.h"
 #include "llvm/InitializePasses.h"
 
 #define DEBUG_TYPE "si-i1-copies"
@@ -275,7 +276,7 @@ class LoopFinder {
   /// Add undef values dominating the loop and the optionally given additional
   /// blocks, so that the SSA updater doesn't have to search all the way to the
   /// function entry.
-  void addLoopEntries(unsigned LoopLevel, MachineSSAUpdater &SSAUpdater,
+  void addLoopEntries(unsigned LoopLevel, MachineSSAUpdater2 &SSAUpdater,
                       MachineRegisterInfo &MRI,
                       MachineRegisterInfo::VRegAttrs LaneMaskRegAttrs,
                       ArrayRef<Incoming> Incomings = {}) {
@@ -469,7 +470,6 @@ PhiLoweringHelper::PhiLoweringHelper(MachineFunction *MF,
 }
 
 bool PhiLoweringHelper::lowerPhis() {
-  MachineSSAUpdater SSAUpdater(*MF);
   LoopFinder LF(*DT, *PDT);
   PhiIncomingAnalysis PIA(*PDT, TII);
   SmallVector<MachineInstr *, 4> Vreg1Phis;
@@ -524,17 +524,21 @@ bool PhiLoweringHelper::lowerPhis() {
     // in practice.
     unsigned FoundLoopLevel = LF.findLoop(PostDomBound);
 
-    SSAUpdater.Initialize(DstReg);
+    MachineSSAUpdater2 SSAUpdater(*DT, *MF, DstReg);
+    SSAUpdater.AddUseBlock(&MBB);
 
     if (FoundLoopLevel) {
       LF.addLoopEntries(FoundLoopLevel, SSAUpdater, *MRI, LaneMaskRegAttrs,
                         Incomings);
 
       for (auto &Incoming : Incomings) {
+        SSAUpdater.AddUseBlock(Incoming.Block);
         Incoming.UpdatedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
         SSAUpdater.AddAvailableValue(Incoming.Block, Incoming.UpdatedReg);
       }
 
+      SSAUpdater.Calculate();
+
       for (auto &Incoming : Incomings) {
         MachineBasicBlock &IMBB = *Incoming.Block;
         buildMergeLaneMasks(
@@ -556,11 +560,14 @@ bool PhiLoweringHelper::lowerPhis() {
           constrainAsLaneMask(Incoming);
           SSAUpdater.AddAvailableValue(&IMBB, Incoming.Reg);
         } else {
+          SSAUpdater.AddUseBlock(&IMBB);
           Incoming.UpdatedReg = createLaneMaskReg(MRI, LaneMaskRegAttrs);
           SSAUpdater.AddAvailableValue(&IMBB, Incoming.UpdatedReg);
         }
       }
 
+      SSAUpdater.Calculate();
+
       for (auto &Incoming : Incomings) {
         if (!Incoming.UpdatedReg.isValid())
           continue;
@@ -585,7 +592,6 @@ bool PhiLoweringHelper::lowerPhis() {
 
 bool Vreg1LoweringHelper::lowerCopiesToI1() {
   bool Changed = false;
-  MachineSSAUpdater SSAUpdater(*MF);
   LoopFinder LF(*DT, *PDT);
   SmallVector<MachineInstr *, 4> DeadCopies;
 
@@ -643,10 +649,12 @@ bool Vreg1LoweringHelper::lowerCopiesToI1() {
           PDT->findNearestCommonDominator(DomBlocks);
       unsigned FoundLoopLevel = LF.findLoop(PostDomBound);
       if (FoundLoopLevel) {
-        SSAUpdater.Initialize(DstReg);
+        MachineSSAUpdater2 SSAUpdater(*DT, *MF, DstReg);
+        SSAUpdater.AddUseBlock(&MBB);
         SSAUpdater.AddAvailableValue(&MBB, DstReg);
         LF.addLoopEntries(FoundLoopLevel, SSAUpdater, *MRI, LaneMaskRegAttrs);
 
+        SSAUpdater.Calculate();
         buildMergeLaneMasks(MBB, MI, DL, DstReg,
                             SSAUpdater.GetValueInMiddleOfBlock(&MBB), SrcReg);
         DeadCopies.push_back(&MI);
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
index e800cb2e24a7a..cfef60c66d6a7 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-divergent-i1-used-outside-loop.mir
@@ -101,19 +101,19 @@ body: |
   ; GFX10-NEXT:   successors: %bb.1(0x80000000)
   ; GFX10-NEXT:   liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT:   [[COPY:%[0-9]+]]:_(s32) = COPY $vgpr1
-  ; GFX10-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr2
-  ; GFX10-NEXT:   [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY]](s32), [[COPY1]](s32)
-  ; GFX10-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr3
-  ; GFX10-NEXT:   [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr4
-  ; GFX10-NEXT:   [[MV1:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY2]](s32), [[COPY3]](s32)
+  ; GFX10-NEXT:   [[DEF:%[0-9]+]]:sreg_32_xm0_xexec(s1) = IMPLICIT_DEF
+  ; GFX10-NEXT:   [[COPY:%[0-9]+]]:sreg_32_xm0_xexec(s1) = COPY [[DEF]](s1)
+  ; GFX10-NEXT:   [[COPY1:%[0-9]+]]:_(s32) = COPY $vgpr1
+  ; GFX10-NEXT:   [[COPY2:%[0-9]+]]:_(s32) = COPY $vgpr2
+  ; GFX10-NEXT:   [[MV:%[0-9]+]]:_(p1) = G_MERGE_VALUES [[COPY1]](s32), [[COPY2]](s32)
+  ; GFX10-NEXT:   [[COPY3:%[0-9]+]]:_(s32) = COPY $vgpr3
+  ; GFX10-NEXT:   [[COPY4:%[0-9]+]]:_(s32) = COPY $vgpr4
+  ; GFX10-NEXT:   [[MV1:%[0-9]+]]:_(p0) = G_MERGE_VALUES [[COPY3]](s32), [[COPY4]](s32)
   ; GFX10-NEXT:   [[C:%[0-9]+]]:_(s1) = G_CONSTANT i1 true
-  ; GFX10-NEXT:   [[COPY4:%[0-9]+]]:sreg_32_xm0_xexec(s1) = COPY [[C]](s1)
+  ; GFX10-NEXT:   [[COPY5:%[0-9]+]]:sreg_32_xm0_xexec(s1) = COPY [[C]](s1)
   ; GFX10-NEXT:   [[C1:%[0-9]+]]:_(s32) = G_CONSTANT i32 -1
-  ; GFX10-NEXT:   [[DEF:%[0-9]+]]:sreg_32_xm0_xexec(s1) = IMPLICIT_DEF
-  ; GFX10-NEXT:   [[COPY5:%[0-9]+]]:sreg_32_xm0_xexec(s1) = COPY [[DEF]](s1)
-  ; GFX10-NEXT:   [[S_ANDN2_B32_:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_ANDN2_B32 [[COPY5]](s1), $exec_lo, implicit-def $scc
-  ; GFX10-NEXT:   [[S_AND_B32_:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_AND_B32 $exec_lo, [[COPY4]](s1), implicit-def $scc
+  ; GFX10-NEXT:   [[S_ANDN2_B32_:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_ANDN2_B32 [[COPY]](s1), $exec_lo, implicit-def $scc
+  ; GFX10-NEXT:   [[S_AND_B32_:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_AND_B32 $exec_lo, [[COPY5]](s1), implicit-def $scc
   ; GFX10-NEXT:   [[S_OR_B32_:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_OR_B32 [[S_ANDN2_B32_]](s1), [[S_AND_B32_]](s1), implicit-def $scc
   ; GFX10-NEXT:   [[DEF1:%[0-9]+]]:sreg_32(s1) = IMPLICIT_DEF
   ; GFX10-NEXT: {{  $}}
diff --git a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir
index b76d421c16172..994640e524fc9 100644
--- a/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir
+++ b/llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir
@@ -1026,10 +1026,10 @@ body: |
   ; GFX10-NEXT:   [[OR:%[0-9]+]]:_(s1) = G_OR [[ICMP2]], [[XOR1]]
   ; GFX10-NEXT:   [[XOR2:%[0-9]+]]:_(s1) = G_XOR [[OR]], [[C4]]
   ; GFX10-NEXT:   [[COPY17:%[0-9]+]]:sreg_32_xm0_xexec(s1) = COPY [[XOR2]](s1)
-  ; GFX10-NEXT:   [[S_ANDN2_B32_1:%[0-9]+]]:sreg_32(s1) = S_ANDN2_B32 %46(s1), $exec_lo, implicit-def $scc
+  ; GFX10-NEXT:   [[S_ANDN2_B32_1:%[0-9]+]]:sreg_32(s1) = S_ANDN2_B32 %47(s1), $exec_lo, implicit-def $scc
   ; GFX10-NEXT:   [[S_AND_B32_1:%[0-9]+]]:sreg_32(s1) = S_AND_B32 $exec_lo, [[COPY16]](s1), implicit-def $scc
   ; GFX10-NEXT:   [[S_OR_B32_1:%[0-9]+]]:sreg_32(s1) = S_OR_B32 [[S_ANDN2_B32_1]](s1), [[S_AND_B32_1]](s1), implicit-def $scc
-  ; GFX10-NEXT:   [[S_ANDN2_B32_2:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_ANDN2_B32 %53(s1), $exec_lo, implicit-def $scc
+  ; GFX10-NEXT:   [[S_ANDN2_B32_2:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_ANDN2_B32 %54(s1), $exec_lo, implicit-def $scc
   ; GFX10-NEXT:   [[S_AND_B32_2:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_AND_B32 $exec_lo, [[COPY17]](s1), implicit-def $scc
   ; GFX10-NEXT:   [[S_OR_B32_2:%[0-9]+]]:sreg_32_xm0_xexec(s1) = S_OR_B32 [[S_ANDN2_B32_2]](s1), [[S_AND_B32_2]](s1), implicit-def $scc
   ; GFX10-NEXT:   G_BR %bb.1
@@ -1195,7 +1195,7 @@ body: |
   ; GFX10-NEXT: bb.2:
   ; GFX10-NEXT:   successors: %bb.4(0x40000000), %bb.7(0x40000000)
   ; GFX10-NEXT: {{  $}}
-  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32_xm0_xexec(s1) = PHI %52(s1), %bb.6, %56(s1), %bb.7
+  ; GFX10-NEXT:   [[PHI:%[0-9]+]]:sreg_32_xm0_xexec(s1) = PHI %52(s1), %bb.6, %57(s1), %bb.7
   ; GFX10-NEXT:   [[PHI1:%[0-9]+]]:sreg_32(s1) = PHI %41(s1), %bb.6, %40(s1), %bb.7
   ; GFX10-NEXT:   [[PHI2:%[0-9]+]]:_(s1) = G_PHI %12(s1), %bb.6, [[DEF]](s1), %bb.7
   ; GFX10-NEXT:   [[COPY7:%[0-9]+]]:sreg_32_xm0_xexec(s1) = COPY [[PHI2]](s1)
diff --git a/llvm/test/CodeGen/AMDGPU/si-lower-i1-copies-order-of-phi-incomings.mir b/llvm/test/CodeGen/AMDGPU/si-lower-i1-copies-order-of-phi-incomings.mir
index ecbd47a9e8d0d..9c27cb3017e95 100644
--- a/llvm/test/CodeGen/AMDGPU/si-lower-i1-copies-order-of-phi-incomings.mir
+++ b/llvm/test/CodeGen/AMDGPU/si-lower-i1-copies-order-of-phi-incomings.mir
@@ -20,21 +20,21 @@ body: |
   ; GCN-NEXT:   successors: %bb.1(0x80000000)
   ; GCN-NEXT:   liveins: $vgpr1, $vgpr2, $vgpr3, $vgpr4
   ; GCN-NEXT: {{  $}}
+  ; GCN-NEXT:   [[DEF:%[0-9]+]]:sreg_32 = IMPLICIT_DEF
   ; GCN-NEXT:   [[COPY:%[0-9]+]]:vgpr_32 = COPY $vgpr4
   ; GCN-NEXT:   [[COPY1:%[0-9]+]]:vgpr_32 = COPY $vgpr3
   ; GCN-NEXT:   [[COPY2:%[0-9]+]]:vgpr_32 = COPY $vgpr2
   ; GCN-NEXT:   [[COPY3:%[0-9]+]]:vgpr_32 = COPY $vgpr1
-  ; GCN-NEXT:   [[DEF:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
   ; GCN-NEXT:   [[DEF1:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
-  ; GCN-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY]], %subreg.sub1
   ; GCN-NEXT:   [[DEF2:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   [[REG_SEQUENCE:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY1]], %subreg.sub0, [[COPY]], %subreg.sub1
   ; GCN-NEXT:   [[DEF3:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
+  ; GCN-NEXT:   [[DEF4:%[0-9]+]]:sgpr_32 = IMPLICIT_DEF
   ; GCN-NEXT:   [[REG_SEQUENCE1:%[0-9]+]]:vreg_64 = REG_SEQUENCE [[COPY3]], %subreg.sub0, [[COPY2]], %subreg.sub1
   ; GCN-NEXT:   [[S_MOV_B32_:%[0-9]+]]:sreg_32 = S_MOV_B32 -1
   ; GCN-NEXT:   [[S_MOV_B32_1:%[0-9]+]]:sreg_32 = S_MOV_B32 -1
   ; GCN-NEXT:   [[COPY4:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE]]
   ; GCN-NEXT:   [[COPY5:%[0-9]+]]:vreg_64 = COPY [[REG_SEQUENCE1]]
-  ; GCN-NEXT:   [[DEF4:%[0-9]+]]:s...
[truncated]

Copilot

Pull Request Overview

This PR introduces a new, faster SSA updater (MachineSSAUpdater2) for integration into AMDGPU, aimed at reducing compilation times for i1 copy lowering.

Ported and integrated MachineSSAUpdater2 for better performance.
Updated test cases, build files, and the SILowerI1Copies pass to use the new updater.
Added new implementation files for MachineSSAUpdater2 with adjustments to SSA update logic.

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
llvm/test/CodeGen/AMDGPU/si-lower-i1-copies-order-of-phi-incomings.mir	Updated MIR patterns to reflect new SSA updater behavior.
llvm/test/CodeGen/AMDGPU/GlobalISel/divergence-structurizer.mir	Adjusted instruction immediates and PHI node expectations.
llvm/test/CodeGen/AMDGPU/divergence-divergent-i1-used-outside-loop.mir	Modified use of copy and merge instructions to align with new SSA updater.
llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp	Replaced MachineSSAUpdater with MachineSSAUpdater2 and updated related logic.
llvm/lib/CodeGen/MachineSSAUpdater2.cpp	New implementation providing the improved SSA updater functionality.
llvm/include/llvm/CodeGen/MachineSSAUpdater2.h	Declares MachineSSAUpdater2; includes minor header comment inconsistency.
llvm/lib/CodeGen/CMakeLists.txt	Updated to compile the new MachineSSAUpdater2 implementation.

Comments suppressed due to low confidence (1)

llvm/include/llvm/CodeGen/MachineSSAUpdater2.h:1

The header comment refers to 'MachineSSAUpdaterBulk.h' instead of 'MachineSSAUpdater2.h'. Please update the comment to match the file name for consistency.

//===- MachineSSAUpdaterBulk.h - Unstructured SSA Update Tool ----------*- C++

llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp

cdevadas · 2025-06-25T15:46:55Z

llvm/include/llvm/CodeGen/MachineSSAUpdater2.h

The filename MachineSSAUpdater2 doesn't look good. Can you give a meaningful suffix?

Sure, but let's finish with other issues, just not to mess up the review. Open to suggestion though.

petar-avramovic · 2025-06-25T16:03:35Z

We also use SSAUpdater in AMDGPUGlobalISelDivergenceLowering.cpp, test changes look fine, instructions are added at different time as far as I can see. Why not replace existing one with fast version?

arsenm · 2025-06-25T23:12:31Z

llvm/include/llvm/CodeGen/MachineSSAUpdater2.h

Bad line break

Still has bad line break

llvm/include/llvm/CodeGen/MachineSSAUpdater2.h

arsenm

Can you templatify the IR one like is done for other passes, e.g uniformity info?

llvm/lib/CodeGen/MachineSSAUpdater2.cpp

jmmartinez · 2025-06-26T12:46:42Z

llvm/lib/CodeGen/MachineSSAUpdater2.cpp

What do you think about creating the implicit def in the last basic-block of "Stack" such that it dominates all the rest ? Then we could loop over the blocks in the "stack" and use it as the live-in/live-out.

That was my first version of this patch. I'm a bit worry about elongating the live ranges for the undef value. In theory it should not count as register usage, but it might not be case. With my latest commit I followed your idea and put the undef in the topmost dominating block.

vpykhtin · 2025-08-05T08:18:51Z

We also use SSAUpdater in AMDGPUGlobalISelDivergenceLowering.cpp, test changes look fine, instructions are added at different time as far as I can see. Why not replace existing one with fast version?

Sure, but let's do it gradually.

vpykhtin · 2025-08-05T08:22:41Z

Can you templatify the IR one like is done for other passes, e.g uniformity info?

I'm not sure if it complicate things more than does good, but some places possibly can be templatified like 'computeLiveInBlocks'.

llvm/include/llvm/CodeGen/MachineSSAUpdater2.h

llvm/lib/CodeGen/MachineSSAUpdater2.cpp

github-actions · 2025-10-06T14:07:13Z

✅ With the latest revision this PR passed the C/C++ code formatter.

vpykhtin · 2025-10-06T14:07:21Z

Renamed MachineSSAUpdater2 -> MachineIDFSSAUpdater. Squashed, rebased and force pushed.

…lass. This is a port of SSAUpdaterBulk to machine IR minus "bulk" part. Phi deduplication and simplification are not yet implemented but can be added if needed. When used in AMDGPU to replace MachineSSAUpdater for i1 copy lowering, it reduced compilation time from 417 to 180 seconds on a large test case (56% improvement).

llvm/lib/CodeGen/MachineIDFSSAUpdater.cpp

arsenm · 2025-10-10T07:28:41Z

llvm/lib/CodeGen/MachineIDFSSAUpdater.cpp

+  }
+
+  if (!V) {
+    V = createInst(TargetOpcode::IMPLICIT_DEF, TopDomBB,


This should use G_IMPLICIT_DEF if this is used with generic vregs, I thought the old code already supported this

Given that it's a generic class and can potentially be used with generic registers it's probably a good idea to make some config flag for it.

I haven't found traces of G_IMPLICIT_DEF support for the MachineSSAUpdater class.

Added parameter to the constructor to select appropriate opcode. Thought about doing this CRTP but it seems an overkill.

gentle ping.

global-isel only needs to create new registers using VRegAttrs. But it was not for generic vregs support. What globalisel actually needed was 'register class + LLT' because of uniformity anaysis. Did not try this patch with G_IMPLICIT_DEF but it would probably crash something if it had register class and LLT.

llvm/lib/CodeGen/MachineIDFSSAUpdater.cpp

…IT_DEF.

petar-avramovic · 2025-10-22T15:51:11Z

llvm/include/llvm/CodeGen/MachineIDFSSAUpdater.h

+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_TRANSFORMS_UTILS_MACHINE_SSAUPDATER2_H


SSAUPDATER2 -> IDFSSAUPDATER

arsenm · 2025-10-23T04:04:47Z

llvm/include/llvm/CodeGen/MachineIDFSSAUpdater.h

+#ifndef LLVM_TRANSFORMS_UTILS_MACHINE_SSAUPDATER2_H
+#define LLVM_TRANSFORMS_UTILS_MACHINE_SSAUPDATER2_H
+
+#include "llvm/CodeGen/MachineRegisterInfo.h"


Only need to forward declare MachineRegisterInfo

vpykhtin requested review from cdevadas, Copilot and petar-avramovic June 25, 2025 15:38

llvmbot added backend:AMDGPU llvm:codegen llvm:globalisel labels Jun 25, 2025

Copilot AI reviewed Jun 25, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SILowerI1Copies.cpp Outdated Show resolved Hide resolved

cdevadas reviewed Jun 25, 2025

View reviewed changes

petar-avramovic requested a review from nhaehnle June 25, 2025 16:03

arsenm reviewed Jun 25, 2025

View reviewed changes

llvm/include/llvm/CodeGen/MachineSSAUpdater2.h Outdated Show resolved Hide resolved

arsenm reviewed Jun 25, 2025

View reviewed changes

llvm/lib/CodeGen/MachineSSAUpdater2.cpp Outdated Show resolved Hide resolved

jmmartinez reviewed Jun 26, 2025

View reviewed changes

vpykhtin force-pushed the machine_ssa_updater2 branch from 5c7a4f0 to a62fcb6 Compare August 5, 2025 08:11

arsenm reviewed Aug 5, 2025

View reviewed changes

llvm/include/llvm/CodeGen/MachineSSAUpdater2.h Outdated Show resolved Hide resolved

arsenm reviewed Aug 5, 2025

View reviewed changes

llvm/lib/CodeGen/MachineSSAUpdater2.cpp Outdated Show resolved Hide resolved

llvm/lib/CodeGen/MachineSSAUpdater2.cpp Outdated Show resolved Hide resolved

vpykhtin force-pushed the machine_ssa_updater2 branch from a62fcb6 to bc8628a Compare October 6, 2025 14:04

vpykhtin force-pushed the machine_ssa_updater2 branch from bc8628a to 79e3a67 Compare October 6, 2025 14:11

vpykhtin force-pushed the machine_ssa_updater2 branch from 79e3a67 to f1d527a Compare October 6, 2025 14:14

arsenm reviewed Oct 10, 2025

View reviewed changes

vpykhtin added 3 commits October 10, 2025 13:56

per-review fixes.

6c1f128

Merge branch 'main' into machine_ssa_updater2

2c66f67

Add constructor parameter to switch between G_IMPLICIT_DEF and IMPLIC…

72b9a0a

…IT_DEF.

vpykhtin added 2 commits October 15, 2025 11:58

Merge branch 'main' into machine_ssa_updater2

f116bef

fix member initialization order

33dac33

petar-avramovic reviewed Oct 22, 2025

View reviewed changes

arsenm reviewed Oct 23, 2025

View reviewed changes

[MachineSSAUpdater][AMDGPU] Add faster version of MachineSSAUpdater class. #145722

Are you sure you want to change the base?

[MachineSSAUpdater][AMDGPU] Add faster version of MachineSSAUpdater class. #145722

Uh oh!

Conversation

vpykhtin commented Jun 25, 2025

Uh oh!

llvmbot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

petar-avramovic commented Jun 25, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

arsenm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpykhtin commented Aug 5, 2025

Uh oh!

vpykhtin commented Aug 5, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Oct 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vpykhtin commented Oct 6, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vpykhtin Oct 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

llvmbot commented Jun 25, 2025 •

edited

Loading

github-actions bot commented Oct 6, 2025 •

edited

Loading

vpykhtin Oct 10, 2025 •

edited

Loading