[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization #149224

lucas-rami · 2025-07-16T23:48:01Z

The RPTarget's way of determining whether VGPRs are beneficial to save and whether the target has been reached w.r.t. VGPR usage currently assumes, if CombinedVGPRSavings is true, that free slots in one VGPR RC can always be used for the other. Implicitly, this makes the rematerialization stage (only current user of RPTarget) follow a different occupancy calculation than the "regular one" that the scheduler uses, one that assumes that ArchVGPR/AGPR usage can be balanced perfectly and at no cost, which is untrue in general. This ultimately yields suboptimal rematerialization decisions that require cross-VGPR-RC copies unnecessarily.

This fixes that, making the RPTarget's internal model of occupancy consistent with the regular one. The CombinedVGPRSavings flag is removed, and a form of cross-VGPR-RC saving implemented only for unified RFs, which is where it makes the most sense. Only when the amount of free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the excess VGPR usage in the other VGPR RC does the RPTarget consider that a pressure reduction in the former will be beneficial to the latter.

llvmbot · 2025-07-16T23:48:29Z

@llvm/pr-subscribers-backend-amdgpu

Author: Lucas Ramirez (lucas-rami)

Changes

The scheduler's rematerialization stage uses GCNRPTarget's CombinedVGPRSavings parameter to control whether free "VGPR slots" in one bank (e.g., ArchVGPRs) can be considered as free slots for VGPRs of another bank (e.g., AGPRs) when accounting for RP savings required to achieve the stage's goal (reducing spilling or increasing occupancy). Enabling the parameter implicitly changes the way we compute the achievable occupancy w.r.t. a given VGPR pressure; it assumes a best-case scenario where ArchVGPR/AGPR usage will be perfectly balanced out during RA, potentially achieving higher occupancies on subtargets with non-unified RFs. We currently only enable this when reducing spilling to avoid inconsistencies (which cause reverts/rollbacks) between the GCNRPTargets' estimated occupancies and the scheduler's "regular" estimated occupancies when increasing occupancy.

This makes the whole rematerialization stage use occupancy estimates influenced by GCNRPTarget::CombinedVGPRSavings, which is now always enabled by default for the stage. GCNRegPressure::getOccupancy gets an extra off-by-default flag to enable VGPR bank balancing in its calculations.

I disabled combined VGPR savings in machine-scheduler-sink-trivial-remats.mir to avoid test churn. machine-scheduler-sink-trivial-remats-vgpr-savings.mir explicitly showcases the new behavior.

Patch is 92.53 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/149224.diff

11 Files Affected:

(modified) llvm/lib/Target/AMDGPU/GCNRegPressure.cpp (+26-11)
(modified) llvm/lib/Target/AMDGPU/GCNRegPressure.h (+26-13)
(modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+93-70)
(modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+3-6)
(modified) llvm/test/CodeGen/AMDGPU/dbg-value-ends-sched-region.mir (+17-17)
(modified) llvm/test/CodeGen/AMDGPU/debug-value-scheduler-crash.mir (+19-19)
(modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir (+33-32)
(modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-debug.mir (+5-4)
(added) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-vgpr-savings.mir (+453)
(modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats.mir (+6-6)
(modified) llvm/test/CodeGen/AMDGPU/sched-assert-dead-def-subreg-use-other-subreg.mir (+1-1)

diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
index 7d6723a6108be..f3f6c5e845fc6 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.cpp
@@ -14,6 +14,7 @@
 #include "GCNRegPressure.h"
 #include "AMDGPU.h"
 #include "SIMachineFunctionInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
 #include "llvm/CodeGen/RegisterPressure.h"
 
 using namespace llvm;
@@ -41,6 +42,18 @@ unsigned GCNRegPressure::getRegKind(const TargetRegisterClass *RC,
   return STI->isSGPRClass(RC) ? SGPR : (STI->isAGPRClass(RC) ? AGPR : VGPR);
 }
 
+unsigned GCNRegPressure::getOccupancy(const GCNSubtarget &ST,
+                                      unsigned DynamicVGPRBlockSize,
+                                      bool BalanceVGPRUsage) const {
+  const bool UnifiedRF = ST.hasGFX90AInsts();
+  unsigned NumVGPRs = (!UnifiedRF && BalanceVGPRUsage)
+                          ? divideCeil(getArchVGPRNum() + getAGPRNum(), 2)
+                          : getVGPRNum(UnifiedRF);
+
+  return std::min(ST.getOccupancyWithNumSGPRs(getSGPRNum()),
+                  ST.getOccupancyWithNumVGPRs(NumVGPRs, DynamicVGPRBlockSize));
+}
+
 void GCNRegPressure::inc(unsigned Reg,
                          LaneBitmask PrevMask,
                          LaneBitmask NewMask,
@@ -366,31 +379,33 @@ static LaneBitmask findUseBetween(unsigned Reg, LaneBitmask LastUseMask,
 
 GCNRPTarget::GCNRPTarget(const MachineFunction &MF, const GCNRegPressure &RP,
                          bool CombineVGPRSavings)
-    : RP(RP), CombineVGPRSavings(CombineVGPRSavings) {
+    : MF(MF), RP(RP) {
   const Function &F = MF.getFunction();
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
-  setRegLimits(ST.getMaxNumSGPRs(F), ST.getMaxNumVGPRs(F), MF);
+  setTarget(ST.getMaxNumSGPRs(F), ST.getMaxNumVGPRs(F), CombineVGPRSavings);
 }
 
 GCNRPTarget::GCNRPTarget(unsigned NumSGPRs, unsigned NumVGPRs,
                          const MachineFunction &MF, const GCNRegPressure &RP,
                          bool CombineVGPRSavings)
-    : RP(RP), CombineVGPRSavings(CombineVGPRSavings) {
-  setRegLimits(NumSGPRs, NumVGPRs, MF);
+    : MF(MF), RP(RP) {
+  setTarget(NumSGPRs, NumVGPRs, CombineVGPRSavings);
 }
 
 GCNRPTarget::GCNRPTarget(unsigned Occupancy, const MachineFunction &MF,
                          const GCNRegPressure &RP, bool CombineVGPRSavings)
-    : RP(RP), CombineVGPRSavings(CombineVGPRSavings) {
+    : MF(MF), RP(RP) {
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
   unsigned DynamicVGPRBlockSize =
       MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize();
-  setRegLimits(ST.getMaxNumSGPRs(Occupancy, /*Addressable=*/false),
-               ST.getMaxNumVGPRs(Occupancy, DynamicVGPRBlockSize), MF);
+  setTarget(ST.getMaxNumSGPRs(Occupancy, /*Addressable=*/false),
+            ST.getMaxNumVGPRs(Occupancy, DynamicVGPRBlockSize),
+            CombineVGPRSavings);
 }
 
-void GCNRPTarget::setRegLimits(unsigned NumSGPRs, unsigned NumVGPRs,
-                               const MachineFunction &MF) {
+void GCNRPTarget::setTarget(unsigned NumSGPRs, unsigned NumVGPRs,
+                            bool CombineVGPRSavings) {
+  this->CombineVGPRSavings = CombineVGPRSavings;
   const GCNSubtarget &ST = MF.getSubtarget<GCNSubtarget>();
   unsigned DynamicVGPRBlockSize =
       MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize();
@@ -402,8 +417,8 @@ void GCNRPTarget::setRegLimits(unsigned NumSGPRs, unsigned NumVGPRs,
           : 0;
 }
 
-bool GCNRPTarget::isSaveBeneficial(Register Reg,
-                                   const MachineRegisterInfo &MRI) const {
+bool GCNRPTarget::isSaveBeneficial(Register Reg) const {
+  const MachineRegisterInfo &MRI = MF.getRegInfo();
   const TargetRegisterClass *RC = MRI.getRegClass(Reg);
   const TargetRegisterInfo *TRI = MRI.getTargetRegisterInfo();
   const SIRegisterInfo *SRI = static_cast<const SIRegisterInfo *>(TRI);
diff --git a/llvm/lib/Target/AMDGPU/GCNRegPressure.h b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
index 3749b6d1efc63..57295a9d15724 100644
--- a/llvm/lib/Target/AMDGPU/GCNRegPressure.h
+++ b/llvm/lib/Target/AMDGPU/GCNRegPressure.h
@@ -18,6 +18,8 @@
 #define LLVM_LIB_TARGET_AMDGPU_GCNREGPRESSURE_H
 
 #include "GCNSubtarget.h"
+#include "SIMachineFunctionInfo.h"
+#include "Utils/AMDGPUBaseInfo.h"
 #include "llvm/CodeGen/LiveIntervals.h"
 #include "llvm/CodeGen/RegisterPressure.h"
 #include <algorithm>
@@ -69,12 +71,12 @@ struct GCNRegPressure {
   }
   unsigned getSGPRTuplesWeight() const { return Value[TOTAL_KINDS + SGPR]; }
 
-  unsigned getOccupancy(const GCNSubtarget &ST,
-                        unsigned DynamicVGPRBlockSize) const {
-    return std::min(ST.getOccupancyWithNumSGPRs(getSGPRNum()),
-                    ST.getOccupancyWithNumVGPRs(getVGPRNum(ST.hasGFX90AInsts()),
-                                                DynamicVGPRBlockSize));
-  }
+  /// Determines the occupancy achievable with the current RP, when \p
+  /// BalanceVGPRUsage is true on subtargets with non-unified RFs, the
+  /// occupancy w.r.t. the number of VGPRs is computed as if we will later be
+  /// able to evenly balance out VGPR usage among ArchVGPR and AGPR banks.
+  unsigned getOccupancy(const GCNSubtarget &ST, unsigned DynamicVGPRBlockSize,
+                        bool BalanceVGPRUsage = false) const;
 
   void inc(unsigned Reg,
            LaneBitmask PrevMask,
@@ -187,22 +189,34 @@ class GCNRPTarget {
   GCNRPTarget(unsigned Occupancy, const MachineFunction &MF,
               const GCNRegPressure &RP, bool CombineVGPRSavings = false);
 
+  /// Changes the target (same semantics as constructor).
+  void setTarget(unsigned NumSGPRs, unsigned NumVGPRs,
+                 bool CombineVGPRSavings = false);
+
   const GCNRegPressure &getCurrentRP() const { return RP; }
 
   void setRP(const GCNRegPressure &NewRP) { RP = NewRP; }
 
   /// Determines whether saving virtual register \p Reg will be beneficial
   /// towards achieving the RP target.
-  bool isSaveBeneficial(Register Reg, const MachineRegisterInfo &MRI) const;
+  bool isSaveBeneficial(Register Reg) const;
 
   /// Saves virtual register \p Reg with lanemask \p Mask.
-  void saveReg(Register Reg, LaneBitmask Mask, const MachineRegisterInfo &MRI) {
-    RP.inc(Reg, Mask, LaneBitmask::getNone(), MRI);
+  void saveReg(Register Reg, LaneBitmask Mask) {
+    RP.inc(Reg, Mask, LaneBitmask::getNone(), MF.getRegInfo());
   }
 
   /// Whether the current RP is at or below the defined pressure target.
   bool satisfied() const;
 
+  /// Computes achievable occupancy with the currently tracked register pressure.
+  unsigned getOccupancy() const {
+    return RP.getOccupancy(
+        MF.getSubtarget<GCNSubtarget>(),
+        MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize(),
+        /*BalanceVGPRUsage=*/CombineVGPRSavings);
+  }
+
 #if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP)
   friend raw_ostream &operator<<(raw_ostream &OS, const GCNRPTarget &Target) {
     OS << "Actual/Target: " << Target.RP.getSGPRNum() << '/' << Target.MaxSGPRs
@@ -222,6 +236,8 @@ class GCNRPTarget {
 #endif
 
 private:
+  const MachineFunction &MF;
+
   /// Current register pressure.
   GCNRegPressure RP;
 
@@ -234,7 +250,7 @@ class GCNRPTarget {
   unsigned MaxUnifiedVGPRs;
   /// Whether we consider that the register allocator will be able to swap
   /// between ArchVGPRs and AGPRs by copying them to a super register class.
-  /// Concretely, this allows savings in one of the VGPR banks to help toward
+  /// Concretely, this allows free registers in one VGPR bank to help toward
   /// savings in the other VGPR bank.
   bool CombineVGPRSavings;
 
@@ -252,9 +268,6 @@ class GCNRPTarget {
     return NumVGPRs > MaxVGPRs || !satisfiesUnifiedTarget() ||
            (CombineVGPRSavings && !satisifiesVGPRBanksTarget());
   }
-
-  void setRegLimits(unsigned MaxSGPRs, unsigned MaxVGPRs,
-                    const MachineFunction &MF);
 };
 
 ///////////////////////////////////////////////////////////////////////////////
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index fce8f36d45969..13fd7ac19a607 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -68,6 +68,13 @@ static cl::opt<bool> GCNTrackers(
     cl::desc("Use the AMDGPU specific RPTrackers during scheduling"),
     cl::init(false));
 
+static cl::opt<bool> RematCombineVGPRSavings(
+    "amdgpu-schedule-combine-vgpr-savings", cl::Hidden,
+    cl::desc("Combine ArchVGPR/AGPR savings when computing occupancy estimates "
+             "during rematerialization stage. Set to false for generally more "
+             "aggressive rematerialization."),
+    cl::init(true));
+
 const unsigned ScheduleMetrics::ScaleFactor = 100;
 
 GCNSchedStrategy::GCNSchedStrategy(const MachineSchedContext *C)
@@ -1116,10 +1123,14 @@ bool PreRARematStage::initGCNSchedStage() {
   rematerialize();
   if (GCNTrackers)
     DAG.RegionLiveOuts.buildLiveRegMap();
-  REMAT_DEBUG(
-      dbgs() << "Retrying function scheduling with new min. occupancy of "
-             << AchievedOcc << " from rematerializing (original was "
-             << DAG.MinOccupancy << ", target was " << TargetOcc << ")\n");
+  REMAT_DEBUG({
+    dbgs() << "Retrying function scheduling with new min. occupancy of "
+           << AchievedOcc << " from rematerializing";
+    if (TargetOcc)
+      dbgs() << ", target was " << *TargetOcc;
+    dbgs() << '\n';
+  });
+
   if (AchievedOcc > DAG.MinOccupancy) {
     DAG.MinOccupancy = AchievedOcc;
     SIMachineFunctionInfo &MFI = *MF.getInfo<SIMachineFunctionInfo>();
@@ -1549,9 +1560,19 @@ bool ClusteredLowOccStage::shouldRevertScheduling(unsigned WavesAfter) {
 }
 
 bool PreRARematStage::shouldRevertScheduling(unsigned WavesAfter) {
-  return GCNSchedStage::shouldRevertScheduling(WavesAfter) ||
-         mayCauseSpilling(WavesAfter) ||
-         (IncreaseOccupancy && WavesAfter < TargetOcc);
+  if (mayCauseSpilling(WavesAfter))
+    return true;
+  if (!TargetOcc) {
+    // Never revert when trying to reduce spilling and RP did not increase in
+    // the region.
+    return false;
+  }
+
+  // When trying to increase occupancy, only revert if we failed to achieve the
+  // target occupancy. Compute occupancy using the RP target instead of the
+  // default RP logic to be consistent with the rest of the stage.
+  return PressureAfter.getOccupancy(ST, DAG.MFI.getDynamicVGPRBlockSize(),
+                                    RematCombineVGPRSavings) < *TargetOcc;
 }
 
 bool ILPInitialScheduleStage::shouldRevertScheduling(unsigned WavesAfter) {
@@ -1700,78 +1721,79 @@ bool PreRARematStage::allUsesAvailableAt(const MachineInstr *InstToRemat,
 }
 
 bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
-  REMAT_DEBUG({
-    dbgs() << "Collecting rematerializable instructions in ";
-    MF.getFunction().printAsOperand(dbgs(), false);
-    dbgs() << '\n';
-  });
+  const Function &F = MF.getFunction();
 
   // Maps optimizable regions (i.e., regions at minimum and register-limited
   // occupancy, or regions with spilling) to the target RP we would like to
   // reach.
   DenseMap<unsigned, GCNRPTarget> OptRegions;
-  const Function &F = MF.getFunction();
-  unsigned DynamicVGPRBlockSize =
-      MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize();
 
-  std::pair<unsigned, unsigned> WavesPerEU = ST.getWavesPerEU(F);
+  // Define "spilling targets" for spilling regions. This takes into account any
+  // SGPR/VGPR usage restriction requested through the "amdgpu-num-sgpr" /
+  // "amdgpu-num-vgpr" attributes beyond the limits imposed by the minimum
+  // number of waves per EU. Usage above those restrictions is considered like
+  // spill.
   const unsigned MaxSGPRsNoSpill = ST.getMaxNumSGPRs(F);
   const unsigned MaxVGPRsNoSpill = ST.getMaxNumVGPRs(F);
-  const unsigned MaxSGPRsIncOcc =
-      ST.getMaxNumSGPRs(DAG.MinOccupancy + 1, false);
-  const unsigned MaxVGPRsIncOcc =
-      ST.getMaxNumVGPRs(DAG.MinOccupancy + 1, DynamicVGPRBlockSize);
-  IncreaseOccupancy = WavesPerEU.second > DAG.MinOccupancy;
-
-  // Collect optimizable regions. If there is spilling in any region we will
-  // just try to reduce spilling. Otherwise we will try to increase occupancy by
-  // one in the whole function.
+  unsigned MinOcc = MFI.getMaxWavesPerEU();
   for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-    GCNRegPressure &RP = DAG.Pressure[I];
-    // We allow ArchVGPR or AGPR savings to count as savings of the other kind
-    // of VGPR only when trying to eliminate spilling. We cannot do this when
-    // trying to increase occupancy since VGPR class swaps only occur later in
-    // the register allocator i.e., the scheduler will not be able to reason
-    // about these savings and will not report an increase in the achievable
-    // occupancy, triggering rollbacks.
+    const GCNRegPressure &RP = DAG.Pressure[I];
     GCNRPTarget Target(MaxSGPRsNoSpill, MaxVGPRsNoSpill, MF, RP,
-                       /*CombineVGPRSavings=*/true);
-    if (!Target.satisfied() && IncreaseOccupancy) {
-      // There is spilling in the region and we were so far trying to increase
-      // occupancy. Strop trying that and focus on reducing spilling.
-      IncreaseOccupancy = false;
-      OptRegions.clear();
-    } else if (IncreaseOccupancy) {
-      // There is no spilling in the region, try to increase occupancy.
-      Target = GCNRPTarget(MaxSGPRsIncOcc, MaxVGPRsIncOcc, MF, RP,
-                           /*CombineVGPRSavings=*/false);
-    }
+                       /*CombineVGPRSavings=*/RematCombineVGPRSavings);
+    MinOcc = std::min(MinOcc, Target.getOccupancy());
     if (!Target.satisfied())
       OptRegions.insert({I, Target});
   }
-  if (OptRegions.empty())
-    return false;
 
-#ifndef NDEBUG
-  if (IncreaseOccupancy) {
-    REMAT_DEBUG(dbgs() << "Occupancy minimal (" << DAG.MinOccupancy
-                       << ") in regions:\n");
+  if (!OptRegions.empty() || MinOcc >= MFI.getMaxWavesPerEU()) {
+    // There is spilling in at least one region, or we are already at maximum
+    // occupancy.
+    TargetOcc = std::nullopt;
   } else {
-    REMAT_DEBUG(dbgs() << "Spilling w.r.t. minimum target occupancy ("
-                       << WavesPerEU.first << ") in regions:\n");
-  }
-  for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
-    if (auto OptIt = OptRegions.find(I); OptIt != OptRegions.end())
-      REMAT_DEBUG(dbgs() << "  [" << I << "] " << OptIt->getSecond() << '\n');
+    // There is no spilling and room to improve occupancy; set up "increased
+    // occupancy" targets for all regions. We further restrict the SGPR/VGPR
+    // limits for increasing occupancy by the "spilling limits" since the latter
+    // may end up smaller due to "amdgpu-num-sgpr" / "amdgpu-num-vgpr"
+    // attributes.
+    TargetOcc = MinOcc + 1;
+
+    unsigned DynamicVGPRBlockSize =
+        MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize();
+    const unsigned MaxSGPRsIncOcc =
+        std::min(MaxSGPRsNoSpill, ST.getMaxNumSGPRs(*TargetOcc, false));
+    const unsigned MaxVGPRsIncOcc = std::min(
+        MaxVGPRsNoSpill, ST.getMaxNumVGPRs(*TargetOcc, DynamicVGPRBlockSize));
+    for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
+      const GCNRegPressure &RP = DAG.Pressure[I];
+      GCNRPTarget Target(MaxSGPRsIncOcc, MaxVGPRsIncOcc, MF, RP,
+                         /*CombineVGPRSavings=*/RematCombineVGPRSavings);
+      if (!Target.satisfied())
+        OptRegions.insert({I, Target});
+    }
+    assert(!OptRegions.empty() && "there should be at least one target region");
   }
-#endif
 
-  // When we are reducing spilling, the target is the minimum target number of
-  // waves/EU determined by the subtarget. In cases where either one of
-  // "amdgpu-num-sgpr" or "amdgpu-num-vgpr" are set on the function, the current
-  // minimum region occupancy may be higher than the latter.
-  TargetOcc = IncreaseOccupancy ? DAG.MinOccupancy + 1
-                                : std::max(DAG.MinOccupancy, WavesPerEU.first);
+  REMAT_DEBUG({
+    dbgs() << "Analyzing ";
+    MF.getFunction().printAsOperand(dbgs(), false);
+    dbgs() << ": ";
+    if (OptRegions.empty()) {
+      LLVM_DEBUG(dbgs() << "no objective to achieve, occupancy is maximal at "
+                        << MFI.getMaxWavesPerEU() << "\n");
+    } else if (!TargetOcc) {
+      LLVM_DEBUG(dbgs() << "reduce spilling (minimum target occupancy is "
+                        << MFI.getMinWavesPerEU() << ")\n");
+    } else {
+      LLVM_DEBUG(dbgs() << "increase occupancy from " << MinOcc << " to "
+                        << TargetOcc << '\n');
+    }
+    for (unsigned I = 0, E = DAG.Regions.size(); I != E; ++I) {
+      if (auto OptIt = OptRegions.find(I); OptIt != OptRegions.end())
+        REMAT_DEBUG(dbgs() << "  [" << I << "] " << OptIt->getSecond() << '\n');
+    }
+  });
+  if (OptRegions.empty())
+    return false;
 
   // Accounts for a reduction in RP in an optimizable region. Returns whether we
   // estimate that we have identified enough rematerialization opportunities to
@@ -1780,10 +1802,10 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
   auto ReduceRPInRegion = [&](auto OptIt, Register Reg, LaneBitmask Mask,
                               bool &Progress) -> bool {
     GCNRPTarget &Target = OptIt->getSecond();
-    if (!Target.isSaveBeneficial(Reg, DAG.MRI))
+    if (!Target.isSaveBeneficial(Reg))
       return false;
     Progress = true;
-    Target.saveReg(Reg, Mask, DAG.MRI);
+    Target.saveReg(Reg, Mask);
     if (Target.satisfied())
       OptRegions.erase(OptIt->getFirst());
     return OptRegions.empty();
@@ -1889,7 +1911,7 @@ bool PreRARematStage::canIncreaseOccupancyOrReduceSpill() {
     }
   }
 
-  if (IncreaseOccupancy) {
+  if (TargetOcc) {
     // We were trying to increase occupancy but failed, abort the stage.
     REMAT_DEBUG(dbgs() << "Cannot increase occupancy\n");
     Rematerializations.clear();
@@ -1994,7 +2016,7 @@ void PreRARematStage::rematerialize() {
   // All regions impacted by at least one rematerialization must be rescheduled.
   // Maximum pressure must also be recomputed for all regions where it changed
   // non-predictably and checked against the target occupancy.
-  AchievedOcc = TargetOcc;
+  AchievedOcc = MFI.getMaxWavesPerEU();
   for (auto &[I, OriginalRP] : ImpactedRegions) {
     bool IsEmptyRegion = DAG.Regions[I].first == DAG.Regions[I].second;
     RescheduleRegions[I] = !IsEmptyRegion;
@@ -2018,9 +2040,10 @@ void PreRARematStage::rematerialize() {
       }
     }
     DAG.Pressure[I] = RP;
-    AchievedOcc = std::min(
-        AchievedOcc, RP.getOccupancy(ST, MF.getInfo<SIMachineFunctionInfo>()
-                                             ->getDynamicVGPRBlockSize()));
+    unsigned NewOcc = RP.getOccupancy(
+        ST, MF.getInfo<SIMachineFunctionInfo>()->getDynamicVGPRBlockSize(),
+        RematCombineVGPRSavings);
+    AchievedOcc = std::min(AchievedOcc, NewOcc);
   }
   REMAT_DEBUG(dbgs() << "Achieved occupancy " << AchievedOcc << "\n");
 }
@@ -2050,7 +2073,7 @@ void PreRARematStage::finalizeGCNSchedStage() {
   // which case we do not want to rollback either (the rescheduling was already
   // reverted in PreRARematStage::shouldRevertScheduling in such cases).
   unsigned MaxOcc = std::max(AchievedOcc, DAG.MinOccupancy);
-  if (!IncreaseOccupancy || MaxOcc >= TargetOcc)
+  if (!TargetOcc || MaxOcc >= *TargetOcc)
     return;
 
   REMAT_DEBUG(dbgs() << "Rolling back all rematerializations\n");
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 94cd795bbc8f6..09ce8751fa95a 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -473,15 +473,12 @@ class PreRARematStage : public GCNSchedStage {
   /// After successful stage initialization, indicates which regions should be
   /// rescheduled.
   BitVector RescheduleRegions;
-  /// Target occupancy the stage estimates is reachable through
-  /// rematerialization. Greater than or equal to the pre-stage min occupancy.
-  unsigned TargetOcc;
+  /// The target occupancy the stage is trying to achieve. Empty when the
+  /// objective is spilling reduction.
+  std::optional<unsigned> TargetO...
[truncated]

github-actions · 2025-07-16T23:50:13Z

✅ With the latest revision this PR passed the C/C++ code formatter.

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp

jrbyrnes · 2025-07-30T23:01:42Z

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp

I think it is only beneficial to reduce the OtherRC if the OtherRC exceeds the addressable limit for that RC and the combined pressure is above the addressable limit for unified RF. In this case we cannot use copies / LiveRange splitting alone to allocate the registers, we must spill.

Since we know the current RC is less than the addressable limit, then it may be enough to just check the unified RP against the addressable limit for unified RF.

By reducing cross RC pressure any time we're over the MaxUnifiedVGPRs, we are telling the rematerializer to issue cross RC copies to increase occupancy.

By reducing cross RC pressure any time we're over the MaxUnifiedVGPRs, we are telling the rematerializer to issue cross RC copies to increase occupancy.

Apologies, I am not sure I understand.

I guess we agree on the spilling case ($MaxVGPRs=256 \wedge MaxUnifiedVGPRs=512$) since in that case $NumVGPRsInRC \leq MaxVGPRs \wedge RP.getVGPRNum(true) > MaxUnifiedVGPRs \Longrightarrow NumVGPRsInOtherRC > MaxVGPRs$ (modulo the VGPR allocation granule in the unified computation) i.e., we only do cross-RC saves if there are too many excess VGPRs in the other RC to fit through copies in the current RC.

For the occupancy increase case ($0<MaxVGPRs=MaxUnifiedVGPRs\leq256$) we always have $NumVGPRsInRC<256$ and $NumVGPRsInOtherRC<256$ otherwise the stage would be trying to reduce spilling. If $NumVGPRsInRC \leq MaxVGPRs \wedge RP.getVGPRNum(true) > MaxUnifiedVGPRs$, isn't any VGPR/AGPR save beneficial? Is there a chance we increase the number of cross RC copies by always saving there?

Yeah I think the way you've implemented it is fine actually.

I was looking at a case where we use the waves-per-eu attribute. Unless amdgpu-agpr-alloc is also specified, we will just split the unified RF in half during RA. In that case, with the current implementation, we would potentially be inserting cross RC copies to increase occupancy. I'm not sure how concerned we should be with this case, since for target's with UnifiedRF, we should be using pure vgpr unless in occupancy 1 case.

However, in the default case (no attributes), RA can allocate up to the maxNumVGPRs for each RC, which should encourage a more optimal split while honoring the unified limit.

Thanks for doing the investigation and for the insights, I'll keep them in mind for my future remat work.

jrbyrnes

LGTM

jrbyrnes · 2025-08-07T17:24:22Z

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

Nit: are any of these comments still relevant?

Yes these are still relevant

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp

It's actually possible to hit the assert on a valid IR if both amdgpu-num-sgprs and amdgpu-num-vgprs are defined and have very low values. It's also unnecessary to take the minimum between the spilling and increase occupancy targets, which will be necessarily higher and therefore automatically satisfied even when these attributes are defined.

…lization (llvm#149224) The `RPTarget`'s way of determining whether VGPRs are beneficial to save and whether the target has been reached w.r.t. VGPR usage currently assumes, if `CombinedVGPRSavings` is true, that free slots in one VGPR RC can always be used for the other. Implicitly, this makes the rematerialization stage (only current user of `RPTarget`) follow a different occupancy calculation than the "regular one" that the scheduler uses, one that assumes that ArchVGPR/AGPR usage can be balanced perfectly and at no cost, which is untrue in general. This ultimately yields suboptimal rematerialization decisions that require cross-VGPR-RC copies unnecessarily. This fixes that, making the `RPTarget`'s internal model of occupancy consistent with the regular one. The `CombinedVGPRSavings` flag is removed, and a form of cross-VGPR-RC saving implemented only for unified RFs, which is where it makes the most sense. Only when the amount of free VGPRs in a given VGPR RC (ArchVPGR or AGPR) is lower than the excess VGPR usage in the other VGPR RC does the `RPTarget` consider that a pressure reduction in the former will be beneficial to the latter.

llvmbot added the backend:AMDGPU label Jul 16, 2025

lucas-rami requested review from arsenm, jrbyrnes and rovka July 16, 2025 23:48

lucas-rami force-pushed the remat-vgpr-combined-savings branch from 0908c8a to 0f1e38d Compare July 25, 2025 16:30

jrbyrnes reviewed Jul 30, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/GCNRegPressure.cpp Outdated Show resolved Hide resolved

jrbyrnes reviewed Jul 30, 2025

View reviewed changes

jrbyrnes approved these changes Aug 7, 2025

View reviewed changes

jrbyrnes reviewed Aug 7, 2025

View reviewed changes

arsenm reviewed Aug 8, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp Outdated Show resolved Hide resolved

lucas-rami added 8 commits August 8, 2025 11:13

Use occupancy calculation taking into account VGPR combined savings

640025a

Format

dbc88c8

Change approach to VGPR savings accounting

7262c7f

Format

5bf0b78

Remove redundant check

83aab61

Fix typo on

890c105

Remove nested debugs

fa1524d

lucas-rami force-pushed the remat-vgpr-combined-savings branch from 0ea15df to fa1524d Compare August 8, 2025 11:35

lucas-rami merged commit 83c308f into llvm:main Aug 8, 2025
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization #149224

[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization #149224

Uh oh!

lucas-rami commented Jul 16, 2025 •

edited

Loading

Uh oh!

llvmbot commented Jul 16, 2025

Uh oh!

github-actions bot commented Jul 16, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jrbyrnes Jul 30, 2025

Uh oh!

lucas-rami Aug 3, 2025

Uh oh!

jrbyrnes Aug 7, 2025 •

edited

Loading

Uh oh!

lucas-rami Aug 8, 2025

Uh oh!

jrbyrnes left a comment

Uh oh!

jrbyrnes Aug 7, 2025

Uh oh!

lucas-rami Aug 8, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization #149224

[AMDGPU][Scheduler] Consistent occupancy calculation during rematerialization #149224

Uh oh!

Conversation

lucas-rami commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

llvmbot commented Jul 16, 2025

Uh oh!

github-actions bot commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jrbyrnes Jul 30, 2025

Choose a reason for hiding this comment

Uh oh!

lucas-rami Aug 3, 2025

Choose a reason for hiding this comment

Uh oh!

jrbyrnes Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lucas-rami Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

jrbyrnes left a comment

Choose a reason for hiding this comment

Uh oh!

jrbyrnes Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

lucas-rami Aug 8, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lucas-rami commented Jul 16, 2025 •

edited

Loading

github-actions bot commented Jul 16, 2025 •

edited

Loading

jrbyrnes Aug 7, 2025 •

edited

Loading