Skip to content

Conversation

@kerbowa
Copy link
Member

@kerbowa kerbowa commented Jul 9, 2025

Examine instructions in the pending queue when scheduling. This makes instructions visible to scheduling heuristics even when they aren't immediately issuable due to hardware resource constraints.

The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach.

This change enables examination of instructions that can't be immediately issued because they use an already occupied unbuffered resource. By making these instructions visible to scheduling heuristics anyway, we gain more flexibility in scheduling decisions, potentially allowing better register pressure and hardware resource management.

@kerbowa kerbowa requested a review from jrbyrnes July 9, 2025 05:17
@llvmbot
Copy link
Member

llvmbot commented Jul 9, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Austin Kerbow (kerbowa)

Changes

Examine instructions in the pending queue when scheduling. This makes instructions visible to scheduling heuristics even when they aren't immediately issuable due to hardware resource constraints.

The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach.

This change enables examination of instructions that can't be immediately issued because they use an already occupied unbuffered resource. By making these instructions visible to scheduling heuristics anyway, we gain more flexibility in scheduling decisions, potentially allowing better register pressure and hardware resource management.


Patch is 579.73 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/147653.diff

17 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp (+153-17)
  • (modified) llvm/lib/Target/AMDGPU/GCNSchedStrategy.h (+19-2)
  • (modified) llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir (+931-927)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.small.mir (+389-391)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.ll (+164-164)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.single.2b.mir (+4-4)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.single.2c.mir (+2-2)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.mfma.ll (+58-58)
  • (modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.sched.group.barrier.ll (+416-416)
  • (modified) llvm/test/CodeGen/AMDGPU/machine-scheduler-sink-trivial-remats-attr.mir (+3-3)
  • (modified) llvm/test/CodeGen/AMDGPU/memintrinsic-unroll.ll (+1601-1613)
  • (modified) llvm/test/CodeGen/AMDGPU/mfma-no-register-aliasing.ll (+51-51)
  • (modified) llvm/test/CodeGen/AMDGPU/rewrite-vgpr-mfma-to-agpr.ll (+50-51)
  • (modified) llvm/test/CodeGen/AMDGPU/sched-barrier-pre-RA.mir (+15-15)
  • (modified) llvm/test/CodeGen/AMDGPU/sched-group-barrier-pipeline-solver.mir (+8-8)
  • (modified) llvm/test/CodeGen/AMDGPU/sched-group-barrier-pre-RA.mir (+22-22)
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
index fce8f36d45969..35886eb04c711 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
@@ -68,6 +68,14 @@ static cl::opt<bool> GCNTrackers(
     cl::desc("Use the AMDGPU specific RPTrackers during scheduling"),
     cl::init(false));
 
+static cl::opt<bool> ExaminePendingQueue(
+    "amdgpu-examine-pending-queue", cl::Hidden,
+    cl::desc(
+        "Examine instructions in the pending the pending queue when "
+        "scheduling. This makes instructions visible to heuristics that cannot "
+        "immediately be issued due to hardware resource constraints."),
+    cl::init(true));
+
 const unsigned ScheduleMetrics::ScaleFactor = 100;
 
 GCNSchedStrategy::GCNSchedStrategy(const MachineSchedContext *C)
@@ -319,17 +327,45 @@ void GCNSchedStrategy::initCandidate(SchedCandidate &Cand, SUnit *SU,
   }
 }
 
+static bool shouldCheckPending(SchedBoundary &Zone,
+                               const TargetSchedModel *SchedModel) {
+  const unsigned ReadyListLimit = 256;
+  bool HasBufferedModel =
+      SchedModel->hasInstrSchedModel() && SchedModel->getMicroOpBufferSize();
+  return ExaminePendingQueue &&
+         Zone.Available.size() + Zone.Pending.size() <= ReadyListLimit &&
+         HasBufferedModel;
+}
+
+static SUnit *pickOnlyChoice(SchedBoundary &Zone,
+                             const TargetSchedModel *SchedModel) {
+  if (!shouldCheckPending(Zone, SchedModel) || Zone.Pending.empty())
+    return Zone.pickOnlyChoice();
+  return nullptr;
+}
+
+#ifndef NDEBUG
+void GCNSchedStrategy::printCandidateDecision(const SchedCandidate &Current,
+                                              const SchedCandidate &Preferred) {
+  LLVM_DEBUG(dbgs() << "Prefer:\t\t"; DAG->dumpNode(*Preferred.SU));
+  if (Current.SU)
+    LLVM_DEBUG(dbgs() << "Not:\t"; DAG->dumpNode(*Current.SU));
+  LLVM_DEBUG(dbgs() << "Reason:\t\t"; traceCandidate(Preferred));
+}
+#endif
+
 // This function is mostly cut and pasted from
 // GenericScheduler::pickNodeFromQueue()
 void GCNSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,
                                          const CandPolicy &ZonePolicy,
                                          const RegPressureTracker &RPTracker,
-                                         SchedCandidate &Cand,
+                                         SchedCandidate &Cand, bool &IsPending,
                                          bool IsBottomUp) {
   const SIRegisterInfo *SRI = static_cast<const SIRegisterInfo *>(TRI);
   ArrayRef<unsigned> Pressure = RPTracker.getRegSetPressureAtPos();
   unsigned SGPRPressure = 0;
   unsigned VGPRPressure = 0;
+  IsPending = false;
   if (DAG->isTrackingPressure()) {
     if (!GCNTrackers) {
       SGPRPressure = Pressure[AMDGPU::RegisterPressureSets::SReg_32];
@@ -342,8 +378,9 @@ void GCNSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,
       VGPRPressure = T->getPressure().getArchVGPRNum();
     }
   }
-  ReadyQueue &Q = Zone.Available;
-  for (SUnit *SU : Q) {
+  LLVM_DEBUG(dbgs() << "Available Q:\n");
+  ReadyQueue &AQ = Zone.Available;
+  for (SUnit *SU : AQ) {
 
     SchedCandidate TryCand(ZonePolicy);
     initCandidate(TryCand, SU, Zone.isTop(), RPTracker, SRI, SGPRPressure,
@@ -355,27 +392,59 @@ void GCNSchedStrategy::pickNodeFromQueue(SchedBoundary &Zone,
       // Initialize resource delta if needed in case future heuristics query it.
       if (TryCand.ResDelta == SchedResourceDelta())
         TryCand.initResourceDelta(Zone.DAG, SchedModel);
+      LLVM_DEBUG(printCandidateDecision(Cand, TryCand));
       Cand.setBest(TryCand);
-      LLVM_DEBUG(traceCandidate(Cand));
     }
+#ifndef NDEBUG
+    else
+      printCandidateDecision(TryCand, Cand);
+#endif
+  }
+
+  if (!shouldCheckPending(Zone, SchedModel))
+    return;
+
+  LLVM_DEBUG(dbgs() << "Pending Q:\n");
+  ReadyQueue &PQ = Zone.Pending;
+  for (SUnit *SU : PQ) {
+
+    SchedCandidate TryCand(ZonePolicy);
+    initCandidate(TryCand, SU, Zone.isTop(), RPTracker, SRI, SGPRPressure,
+                  VGPRPressure, IsBottomUp);
+    // Pass SchedBoundary only when comparing nodes from the same boundary.
+    SchedBoundary *ZoneArg = Cand.AtTop == TryCand.AtTop ? &Zone : nullptr;
+    tryPendingCandidate(Cand, TryCand, ZoneArg);
+    if (TryCand.Reason != NoCand) {
+      // Initialize resource delta if needed in case future heuristics query it.
+      if (TryCand.ResDelta == SchedResourceDelta())
+        TryCand.initResourceDelta(Zone.DAG, SchedModel);
+      LLVM_DEBUG(printCandidateDecision(Cand, TryCand));
+      IsPending = true;
+      Cand.setBest(TryCand);
+    }
+#ifndef NDEBUG
+    else
+      printCandidateDecision(TryCand, Cand);
+#endif
   }
 }
 
 // This function is mostly cut and pasted from
 // GenericScheduler::pickNodeBidirectional()
-SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
+SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode,
+                                               bool &PickedPending) {
   // Schedule as far as possible in the direction of no choice. This is most
   // efficient, but also provides the best heuristics for CriticalPSets.
-  if (SUnit *SU = Bot.pickOnlyChoice()) {
+  if (SUnit *SU = pickOnlyChoice(Bot, SchedModel)) {
     IsTopNode = false;
     return SU;
   }
-  if (SUnit *SU = Top.pickOnlyChoice()) {
+  if (SUnit *SU = pickOnlyChoice(Top, SchedModel)) {
     IsTopNode = true;
     return SU;
   }
-  // Set the bottom-up policy based on the state of the current bottom zone and
-  // the instructions outside the zone, including the top zone.
+  // Set the bottom-up policy based on the state of the current bottom zone
+  // and the instructions outside the zone, including the top zone.
   CandPolicy BotPolicy;
   setPolicy(BotPolicy, /*IsPostRA=*/false, Bot, &Top);
   // Set the top-down policy based on the state of the current top zone and
@@ -383,12 +452,14 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
   CandPolicy TopPolicy;
   setPolicy(TopPolicy, /*IsPostRA=*/false, Top, &Bot);
 
+  bool BotPending = false;
   // See if BotCand is still valid (because we previously scheduled from Top).
   LLVM_DEBUG(dbgs() << "Picking from Bot:\n");
   if (!BotCand.isValid() || BotCand.SU->isScheduled ||
       BotCand.Policy != BotPolicy) {
     BotCand.reset(CandPolicy());
     pickNodeFromQueue(Bot, BotPolicy, DAG->getBotRPTracker(), BotCand,
+                      BotPending,
                       /*IsBottomUp=*/true);
     assert(BotCand.Reason != NoCand && "failed to find the first candidate");
   } else {
@@ -398,6 +469,7 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
       SchedCandidate TCand;
       TCand.reset(CandPolicy());
       pickNodeFromQueue(Bot, BotPolicy, DAG->getBotRPTracker(), TCand,
+                        BotPending,
                         /*IsBottomUp=*/true);
       assert(TCand.SU == BotCand.SU &&
              "Last pick result should correspond to re-picking right now");
@@ -405,12 +477,14 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
 #endif
   }
 
+  bool TopPending = false;
   // Check if the top Q has a better candidate.
   LLVM_DEBUG(dbgs() << "Picking from Top:\n");
   if (!TopCand.isValid() || TopCand.SU->isScheduled ||
       TopCand.Policy != TopPolicy) {
     TopCand.reset(CandPolicy());
     pickNodeFromQueue(Top, TopPolicy, DAG->getTopRPTracker(), TopCand,
+                      TopPending,
                       /*IsBottomUp=*/false);
     assert(TopCand.Reason != NoCand && "failed to find the first candidate");
   } else {
@@ -420,6 +494,7 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
       SchedCandidate TCand;
       TCand.reset(CandPolicy());
       pickNodeFromQueue(Top, TopPolicy, DAG->getTopRPTracker(), TCand,
+                        TopPending,
                         /*IsBottomUp=*/false);
       assert(TCand.SU == TopCand.SU &&
              "Last pick result should correspond to re-picking right now");
@@ -430,12 +505,21 @@ SUnit *GCNSchedStrategy::pickNodeBidirectional(bool &IsTopNode) {
   // Pick best from BotCand and TopCand.
   LLVM_DEBUG(dbgs() << "Top Cand: "; traceCandidate(TopCand);
              dbgs() << "Bot Cand: "; traceCandidate(BotCand););
-  SchedCandidate Cand = BotCand;
-  TopCand.Reason = NoCand;
-  tryCandidate(Cand, TopCand, nullptr);
-  if (TopCand.Reason != NoCand) {
-    Cand.setBest(TopCand);
+  SchedCandidate Cand = BotPending ? TopCand : BotCand;
+  SchedCandidate TryCand = BotPending ? BotCand : TopCand;
+  PickedPending = BotPending && TopPending;
+
+  TryCand.Reason = NoCand;
+  if (BotPending || TopPending) {
+    PickedPending |= tryPendingCandidate(Cand, TopCand, nullptr);
+  } else {
+    tryCandidate(Cand, TryCand, nullptr);
   }
+
+  if (TryCand.Reason != NoCand) {
+    Cand.setBest(TryCand);
+  }
+
   LLVM_DEBUG(dbgs() << "Picking: "; traceCandidate(Cand););
 
   IsTopNode = Cand.AtTop;
@@ -450,35 +534,46 @@ SUnit *GCNSchedStrategy::pickNode(bool &IsTopNode) {
            Bot.Available.empty() && Bot.Pending.empty() && "ReadyQ garbage");
     return nullptr;
   }
+  bool PickedPending;
   SUnit *SU;
   do {
+    PickedPending = false;
     if (RegionPolicy.OnlyTopDown) {
-      SU = Top.pickOnlyChoice();
+      SU = pickOnlyChoice(Top, SchedModel);
       if (!SU) {
         CandPolicy NoPolicy;
         TopCand.reset(NoPolicy);
         pickNodeFromQueue(Top, NoPolicy, DAG->getTopRPTracker(), TopCand,
+                          PickedPending,
                           /*IsBottomUp=*/false);
         assert(TopCand.Reason != NoCand && "failed to find a candidate");
         SU = TopCand.SU;
       }
       IsTopNode = true;
     } else if (RegionPolicy.OnlyBottomUp) {
-      SU = Bot.pickOnlyChoice();
+      SU = pickOnlyChoice(Bot, SchedModel);
       if (!SU) {
         CandPolicy NoPolicy;
         BotCand.reset(NoPolicy);
         pickNodeFromQueue(Bot, NoPolicy, DAG->getBotRPTracker(), BotCand,
+                          PickedPending,
                           /*IsBottomUp=*/true);
         assert(BotCand.Reason != NoCand && "failed to find a candidate");
         SU = BotCand.SU;
       }
       IsTopNode = false;
     } else {
-      SU = pickNodeBidirectional(IsTopNode);
+      SU = pickNodeBidirectional(IsTopNode, PickedPending);
     }
   } while (SU->isScheduled);
 
+  if (PickedPending) {
+    unsigned ReadyCycle = IsTopNode ? SU->TopReadyCycle : SU->BotReadyCycle;
+    SchedBoundary &Zone = IsTopNode ? Top : Bot;
+    Zone.bumpCycle(ReadyCycle);
+    Zone.releasePending();
+  }
+
   if (SU->isTopReady())
     Top.removeReady(SU);
   if (SU->isBottomReady())
@@ -524,6 +619,47 @@ GCNSchedStageID GCNSchedStrategy::getNextStage() const {
   return *std::next(CurrentStage);
 }
 
+bool GCNSchedStrategy::tryPendingCandidate(SchedCandidate &Cand,
+                                           SchedCandidate &TryCand,
+                                           SchedBoundary *Zone) const {
+  // Initialize the candidate if needed.
+  if (!Cand.isValid()) {
+    TryCand.Reason = NodeOrder;
+    return true;
+  }
+
+  // Bias PhysReg Defs and copies to their uses and defined respectively.
+  if (tryGreater(biasPhysReg(TryCand.SU, TryCand.AtTop),
+                 biasPhysReg(Cand.SU, Cand.AtTop), TryCand, Cand, PhysReg))
+    return TryCand.Reason != NoCand;
+
+  // Avoid exceeding the target's limit.
+  if (DAG->isTrackingPressure() &&
+      tryPressure(TryCand.RPDelta.Excess, Cand.RPDelta.Excess, TryCand, Cand,
+                  RegExcess, TRI, DAG->MF))
+    return TryCand.Reason != NoCand;
+
+  // Avoid increasing the max critical pressure in the scheduled region.
+  if (DAG->isTrackingPressure() &&
+      tryPressure(TryCand.RPDelta.CriticalMax, Cand.RPDelta.CriticalMax,
+                  TryCand, Cand, RegCritical, TRI, DAG->MF))
+    return TryCand.Reason != NoCand;
+
+  bool SameBoundary = Zone != nullptr;
+  if (SameBoundary) {
+    TryCand.initResourceDelta(DAG, SchedModel);
+    if (tryLess(TryCand.ResDelta.CritResources, Cand.ResDelta.CritResources,
+                TryCand, Cand, ResourceReduce))
+      return TryCand.Reason != NoCand;
+    if (tryGreater(TryCand.ResDelta.DemandedResources,
+                   Cand.ResDelta.DemandedResources, TryCand, Cand,
+                   ResourceDemand))
+      return TryCand.Reason != NoCand;
+  }
+
+  return false;
+}
+
 GCNMaxOccupancySchedStrategy::GCNMaxOccupancySchedStrategy(
     const MachineSchedContext *C, bool IsLegacyScheduler)
     : GCNSchedStrategy(C) {
diff --git a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
index 94cd795bbc8f6..c78835c8d5a77 100644
--- a/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
+++ b/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h
@@ -44,17 +44,34 @@ raw_ostream &operator<<(raw_ostream &OS, const GCNSchedStageID &StageID);
 /// heuristics to determine excess/critical pressure sets.
 class GCNSchedStrategy : public GenericScheduler {
 protected:
-  SUnit *pickNodeBidirectional(bool &IsTopNode);
+  SUnit *pickNodeBidirectional(bool &IsTopNode, bool &PickedPending);
 
   void pickNodeFromQueue(SchedBoundary &Zone, const CandPolicy &ZonePolicy,
                          const RegPressureTracker &RPTracker,
-                         SchedCandidate &Cand, bool IsBottomUp);
+                         SchedCandidate &Cand, bool &IsPending,
+                         bool IsBottomUp);
 
   void initCandidate(SchedCandidate &Cand, SUnit *SU, bool AtTop,
                      const RegPressureTracker &RPTracker,
                      const SIRegisterInfo *SRI, unsigned SGPRPressure,
                      unsigned VGPRPressure, bool IsBottomUp);
 
+  /// Evaluates instructions in the pending queue using a subset of scheduling
+  /// heuristics.
+  ///
+  /// Instructions that cannot be issued due to hardware constraints are placed
+  /// in the pending queue rather than the available queue, making them normally
+  /// invisible to scheduling heuristics. However, in certain scenarios (such as
+  /// avoiding register spilling), it may be beneficial to consider scheduling
+  /// these not-yet-ready instructions.
+  bool tryPendingCandidate(SchedCandidate &Cand, SchedCandidate &TryCand,
+                           SchedBoundary *Zone) const;
+
+#ifndef NDEBUG
+  void printCandidateDecision(const SchedCandidate &Current,
+                              const SchedCandidate &Preferred);
+#endif
+
   std::vector<unsigned> Pressure;
 
   std::vector<unsigned> MaxPressure;
diff --git a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
index 668219875db72..86505107587f1 100644
--- a/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
+++ b/llvm/test/CodeGen/AMDGPU/gfx-callable-return-types.ll
@@ -947,6 +947,7 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
 ; GFX9-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX9-NEXT:    v_mov_b32_e32 v1, 0
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1020
+; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:2044
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:2040
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:2036
@@ -1201,7 +1202,6 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1040
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1036
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1032
-; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1024
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1016
 ; GFX9-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1012
@@ -1466,6 +1466,7 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
 ; GFX10-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
 ; GFX10-NEXT:    v_mov_b32_e32 v1, 0
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1020
+; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:2044
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:2040
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:2036
@@ -1720,7 +1721,6 @@ define amdgpu_gfx <512 x i32> @return_512xi32() #0 {
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1040
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1036
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1032
-; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1028
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1024
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1016
 ; GFX10-NEXT:    buffer_store_dword v1, v0, s[0:3], 0 offen offset:1012
diff --git a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
index aad6e031aa9ed..ac91dadc07995 100644
--- a/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
+++ b/llvm/test/CodeGen/AMDGPU/llvm.amdgcn.iglp.opt.exp.large.mir
@@ -6,1145 +6,1149 @@
   define amdgpu_kernel void @largeInterleave() #0 { ret void }
   ; GCN-LABEL: largeInterleave:
   ; GCN:       ; %bb.0:
+  ; GCN-NEXT:    ; implicit-def: $sgpr17
+  ; GCN-NEXT:    ; implicit-def: $vgpr64
+  ; GCN-NEXT:    ; implicit-def: $vgpr66
   ; GCN-NEXT:    ; implicit-def: $sgpr0_sgpr1_sgpr2_sgpr3_sgpr4_sgpr5_sgpr6_sgpr7_sgpr8_sgpr9_sgpr10_sgpr11_sgpr12_sgpr13_sgpr14_sgpr15
-  ; GCN-NEXT:    ; implicit-def: $vgpr0
-  ; GCN-NEXT:    ; implicit-def: $vgpr2
-  ; GCN-NEXT:    ; implicit-def: $vgpr1
-  ; GCN-NEXT:    ; implicit-def: $vgpr8
+  ; GCN-NEXT:    ; implicit-def: $vgpr65
+  ; GCN-NEXT:    ; implicit-def: $vgpr72
+  ; GCN-NEXT:    ; implicit-def: $vgpr238
+  ; GCN-NEXT:    ; implicit-def: $vgpr152_vgpr153_vgpr154_vgpr155
+  ; GCN-NEXT:    ; implicit-def: $vgpr80
+  ; GCN-NEXT:    ; implicit-def: $vgpr81
+  ; GCN-NEXT:    ; implicit-def: $vgpr82
+  ; GCN-NEXT:    ; implicit-def: $vgpr83
+  ; GCN-NEXT:    ; implicit-def: $vgpr84
+  ; GCN-NEXT:    ; implicit-def: $vgpr85
+  ; GCN-NEXT:    ; implicit-def: $vgpr86
+  ; GCN-NEXT:    ; implicit-def: $vgpr87
+  ; GCN-NEXT:    ; implicit-def: $vgpr88
+  ; GCN-NEXT:    ; implicit-def: $vgpr89
+  ; GCN-NEXT:    ; implicit-def: $vgpr90
+  ; GCN-NEXT:    ; implicit-def: $vgpr91
+  ; GCN-NEXT:    ; implicit-def: $vgpr92
+  ; GCN-NEXT:    ; implicit-def: $vgpr93
   ; GCN-NEXT:    ; implicit-def: $vgpr94
-  ; GCN-NEXT:    ; implicit-def: $vgpr76_vgpr77_vgpr78_vgpr79
-  ; GCN-NEXT:    ; implicit-def: $vgpr106
-  ; GCN-NEXT:    ; implicit-def: $vgpr132
-  ; GCN-NEXT:    ; implicit-def: $vgpr133
-  ; GCN-NEXT:    ; implicit-def: $vgpr139
-  ; GCN-NEXT:    ; implicit-def: $vgpr112_vgpr113_vgpr114_vgpr115_vgpr116_vgpr117_vgpr118_vgpr119_vgpr120_vgpr121_vgpr122_vgpr123_vgpr124_vgpr125_vgpr126_vgpr127
-  ; GCN-NEXT:    ; iglp_opt mask(0x00000002)
-  ; GCN-NEXT:    ; implicit-def: $sgpr0
+  ; GCN-NEXT:    ; implicit-def: $vgpr73
   ; GCN-NEXT:    s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
-  ; GCN-NEXT:    v_readfirstlane_b32 s7, v0
+  ; GCN-NEXT:    v_add_u32_e32 v232, v73, v80
+  ; GCN-NEXT:    v_readfirstlane_b32 s17, v64
+  ; GCN-NEXT:    ; implicit-def: $sgpr15
   ; GCN-NEXT:    ; implicit-def: $sgpr8_sgpr9_sgpr10_sgpr11
-  ; GCN-NEXT:    ; kill: killed $sgpr8_sgpr9_sgpr10_sgpr11
-  ; GCN-NEXT:    ; implicit-def: $sgpr5
-  ; GCN-NEXT:    s_nop 1
-  ; GCN-NEXT:    v_lshl_add_u32 v0, s7, 4, v2
-  ; GCN-NEXT:    v_mul_lo_u32 v0, v0, s6
-  ; GCN-NEXT:    v_add_lshl_u32 v92, v0, v1, 1
-  ; GCN-NEXT:    v_add_u32_e32 v93, s0, v92
-  ; GCN-NEXT:    buffer_load_dwordx4 v[0:3], v92, s[8:11], 0 offen sc0 sc1
+  ; GCN-NEXT:    v_add_u32_e32 v234, v73, v81
+  ; GCN-NEXT:    v_add_u32_e32 v235, v73, v82
+  ; GCN-NEXT:    v_lshl_add_u32 v64, s17, 4, v66
+  ; GCN-NEXT:    v_mul_lo_u32 v64, v64, s6
+  ; GCN-NEXT:    v_add_lshl_u32 v222, v64, v65, 1
+  ; GCN-NEXT:    v_add_u32_e32 v95, s15, v222
+  ; GCN-NEXT:    buffer_load_dwordx4 v[64:67], v222, s[8:11], 0 offen sc0 sc1
   ; GCN...
[truncated]

Comment on lines 350 to 353
LLVM_DEBUG(dbgs() << "Prefer:\t\t"; DAG->dumpNode(*Preferred.SU));
if (Current.SU)
LLVM_DEBUG(dbgs() << "Not:\t"; DAG->dumpNode(*Current.SU));
LLVM_DEBUG(dbgs() << "Reason:\t\t"; traceCandidate(Preferred));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use one debug LLVM_DEBUG({})

Comment on lines 425 to 428
#ifndef NDEBUG
else
printCandidateDecision(TryCand, Cand);
#endif
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't have a macro conditional else

return nullptr;
}

#ifndef NDEBUG
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just leave it in? The body will be empty in release build anyway


static bool shouldCheckPending(SchedBoundary &Zone,
const TargetSchedModel *SchedModel) {
const unsigned ReadyListLimit = 256;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you replace the bool flag with a value for this limit? Disable will be implied by 0

static cl::opt<bool> ExaminePendingQueue(
"amdgpu-examine-pending-queue", cl::Hidden,
cl::desc(
"Examine instructions in the pending the pending queue when "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"Examine instructions in the pending the pending queue when "
"Examine instructions in the pending queue when "

const TargetSchedModel *SchedModel) {
if (!shouldCheckPending(Zone, SchedModel) || Zone.Pending.empty())
return Zone.pickOnlyChoice();
return nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can probably pick the only pending instruction so long as we handle the cycle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added then removed this since it requires a bunch of bookkeeping from the default pickOnlyChoice() to be reimplemented here for a very small gain in rare instances.

cl::desc("Use the AMDGPU specific RPTrackers during scheduling"),
cl::init(false));

static cl::opt<bool> ExaminePendingQueue(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious why we want a flag for this?

return TryCand.Reason != NoCand;
}

return false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like there may be some cases which would benefit from using a stall cycle heuristic. Maybe as a follow-up.

if (PickedPending) {
unsigned ReadyCycle = IsTopNode ? SU->TopReadyCycle : SU->BotReadyCycle;
SchedBoundary &Zone = IsTopNode ? Top : Bot;
Zone.bumpCycle(ReadyCycle);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're making a mess of this test

llc -mtriple=amdgcn -mcpu=gfx908 --run-pass=machine-scheduler --misched-prera-direction=topdown <filename.mir>

--- |
  define amdgpu_kernel void @pending_queue_ready_cycle_bug(ptr addrspace(1) %arg) {
  bb:
    unreachable
  }

...
---
name:            pending_queue_ready_cycle_bug
tracksRegLiveness: true
body:             |
  bb.0.bb:
    liveins: $sgpr4_sgpr5

    %2:sgpr_128 = IMPLICIT_DEF
    %14:vgpr_32 = IMPLICIT_DEF
    %15:vgpr_32 = IMPLICIT_DEF
    %18:areg_512 = IMPLICIT_DEF
    %18:areg_512 = V_MFMA_F32_16X16X1F32_mac_e64 %15, %14, %18, 0, 0, 0, implicit $mode, implicit $exec
    %5:vreg_128 = BUFFER_LOAD_DWORDX4_OFFSET %2, 0, 0, 0, 0, implicit $exec
    %18:areg_512 = V_MFMA_F32_16X16X1F32_mac_e64 %15, %14, %18, 0, 0, 0, implicit $mode, implicit $exec
    undef %84.sub0:vreg_128_align2 = V_ADD_U32_e32 %5.sub0, %14, implicit $exec
    %7:vreg_512 = COPY %18
    SCHED_BARRIER 0
    S_NOP 0, implicit %18, implicit %7, %84
    S_ENDPGM 0
...

We schedule the first MFMA, then the LOAD, then the ADD, then the second MFMA. After scheduling the ADD, the Top zone has current cycle of 83 due to the long latency of the load. I'm not sure why, but I am seeing that the second MFMA is still in the pending queue at this point. Then, we call bumpCycle with cycle 9. This issue is probably more due to not releasing the MFMA from pending than it is due to improper cycle management.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a plan for this? It's possible that the issue is independent from this PR, but this PR exposes us to corrupting the cycle count.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks for finding this. There were two issues with the previous implementation.

  1. The scheduler doesn't keep SU's next ready cycle in sync with possible hazards, and checkHazard(SU) needs to be queried to confirm whether the instruction can be issued in the current cycle. There is not currently a mechanism to identify exactly which cycle the hazard will resolve.
  2. There is additional bookkeeping in the base implementation of pickOnlyChoice() that really should exist in other functions, but for our purposes it makes the most sense to always call pickOnlyChoice() before selecting any candidate SU. This matches the behavior of the generic scheduler.

@jayfoad
Copy link
Contributor

jayfoad commented Aug 18, 2025

The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach.

To clarify: the case you want to improve is the handling of unbuffered resources (BufferSize == 0) in an out-of-order model (MicroOpBufferSize >= 1), right? What exactly is the "special handling" for this case that you mention?

My gut feeling is that "examining instructions in the pending queue when scheduling" destroys the distinction between the available and pending queues. Is there an alternative approach that puts those instructions in the available queue instead? Or can you simply use a buffered resource instead of an unbuffered one - or does that change the behavior in other ways?

@kerbowa
Copy link
Member Author

kerbowa commented Aug 21, 2025

The scheduler has two hardware resource modeling modes: an in-order mode where instructions must be ready to issue before scheduling, and out-of-order models where instructions are always visible to heuristics. Special handling exists for unbuffered processor resources in out-of-order models. These resources can cause pipeline stalls when used back-to-back, so they're typically avoided. However, for AMDGPU targets, managing register pressure and reducing spilling is critical enough to justify exceptions to this approach.

To clarify: the case you want to improve is the handling of unbuffered resources (BufferSize == 0) in an out-of-order model (MicroOpBufferSize >= 1), right? What exactly is the "special handling" for this case that you mention?

My gut feeling is that "examining instructions in the pending queue when scheduling" destroys the distinction between the available and pending queues. Is there an alternative approach that puts those instructions in the available queue instead? Or can you simply use a buffered resource instead of an unbuffered one - or does that change the behavior in other ways?

The scheduler only considers instructions that are in the available queue. In an in-order model (MicroOpBufferSize == 0) instructions are moved into the available queue only when they're ready to issue. In a out-of-order model instructions are placed into the available queue when all dependencies are scheduled, except for the "special case" with unbuffered resources (BufferSize == 0).

What I'm aiming for is twofold:

  1. Preserve the current behavior where issue latency is captured by the scheduler.
  2. Allow register-pressure heuristics to take precedence over issue latency in some circumstances.

If we moved pending instruction to the available queue we'd ignore the penalty for stalling the pipeline. The alternative is to repurpose a heuristic to capture this aspect. As far as I can tell STALL, RES-DEMAND, RES-REDUCE, don't currently do this, and we would need a dedicated heuristic while modeling all instructions with BufferSize >= 1, or to rework the generic SchedModel.

I think the current behavior is an intended tradeoff. You can find some old form discussions and github issues along these lines.

The approach in this patch suits our needs because it allows us to consider custom heuristics for pending (non-ready) instructions in the future, and avoids target independent changes. Like you, I’d prefer if issue latency could be modeled while still letting register pressure dominate, but I was not able to find a better way given how MachineScheduler uses different queues to models pipeline stalls.

@kerbowa kerbowa force-pushed the users/kerbowa/reserved-resource-hazard branch from b67a6dd to 31327a4 Compare August 25, 2025 16:27
Copy link
Member Author

kerbowa commented Aug 25, 2025

This stack of pull requests is managed by Graphite. Learn more about stacking.

@github-actions
Copy link

github-actions bot commented Aug 25, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@kerbowa
Copy link
Member Author

kerbowa commented Sep 3, 2025

ping

@bcahoon bcahoon requested a review from doru1004 September 3, 2025 22:43
@kerbowa kerbowa force-pushed the users/kerbowa/reserved-resource-hazard branch from 31327a4 to 156419d Compare September 7, 2025 05:08
Copy link
Contributor

@jrbyrnes jrbyrnes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - please wait a little while to see if Jay still has an outstanding concern.

It feels like theres a deeper flaw in the interaction between the schedmodel and the scheduler. Probably the pending queue should be replaced with a stall / resource availability heuristic.

That said, it seems like this PR bridges the gap in that the scheduler makes sensible decisions without the deeper rework. All the alternate ways I could think of to do this broke the intent of various other mechanisms.


static bool shouldCheckPending(SchedBoundary &Zone,
const TargetSchedModel *SchedModel) {
bool HasBufferedModel =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this check?

Examine instructions in the pending queue when scheduling. This makes
instructions visible to scheduling heuristics even when they aren't
immediately issuable due to hardware resource constraints.

The scheduler has two hardware resource modeling modes: an in-order mode
where instructions must be ready to issue before scheduling, and
out-of-order models where instructions are always visible to heuristics.
Special handling exists for unbuffered processor resources in
out-of-order models. These resources can cause pipeline stalls when used
back-to-back, so they're typically avoided. However, for AMDGPU targets,
managing register pressure and reducing spilling is critical enough to
justify exceptions to this approach.

This change enables examination of instructions that can't be
immediately issued because they use an already occupied unbuffered
resource. By making these instructions visible to scheduling heuristics
anyway, we gain more flexibility in scheduling decisions, potentially
allowing better register pressure and hardware resouce management.
@kerbowa kerbowa force-pushed the users/kerbowa/reserved-resource-hazard branch from 156419d to bdef634 Compare October 16, 2025 04:10
@kerbowa kerbowa merged commit d4b1ab7 into main Oct 16, 2025
10 checks passed
@kerbowa kerbowa deleted the users/kerbowa/reserved-resource-hazard branch October 16, 2025 15:46
@llvm-ci
Copy link
Collaborator

llvm-ci commented Oct 16, 2025

LLVM Buildbot has detected a new failure on builder sanitizer-aarch64-linux-fuzzer running on sanitizer-buildbot11 while building llvm at step 2 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/159/builds/33233

Here is the relevant piece of the build log for the reference
Step 2 (annotate) failure: 'python ../sanitizer_buildbot/sanitizers/zorg/buildbot/builders/sanitizers/buildbot_selector.py' (failure)
...
[3/145] Generating VCSRevision.h
[4/66] Linking CXX shared module unittests/Support/DynamicLibrary/SecondLib.so
[5/66] Linking CXX shared module unittests/Analysis/InlineOrderPlugin.so
[6/66] Generating VCSVersion.inc
[7/64] Linking CXX shared module unittests/Support/DynamicLibrary/PipSqueak.so
[8/64] Linking CXX shared module unittests/Passes/Plugins/DoublerPlugin.so
[9/64] Linking CXX shared module unittests/Passes/Plugins/TestPlugin.so
[10/64] Linking CXX shared module unittests/Analysis/InlineAdvisorPlugin.so
[11/64] Linking CXX executable bin/llvm-config
[12/64] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm_build0/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm_build0/include -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -c /home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:362:24: error: out-of-line definition of 'printCandidateDecision' does not match any declaration in 'llvm::GCNSchedStrategy'
  362 | void GCNSchedStrategy::printCandidateDecision(const SchedCandidate &Current,
      |                        ^~~~~~~~~~~~~~~~~~~~~~
/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:45:7: note: GCNSchedStrategy defined here
   45 | class GCNSchedStrategy : public GenericScheduler {
      |       ^~~~~~~~~~~~~~~~
1 error generated.
[13/64] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNIterativeScheduler.cpp.o
[14/64] Building CXX object lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64InstrInfo.cpp.o
[15/64] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o
[16/64] Building CXX object lib/Target/RISCV/CMakeFiles/LLVMRISCVCodeGen.dir/RISCVISelLowering.cpp.o
ninja: build stopped: subcommand failed.
Step 7 (stage1 build all) failure: stage1 build all (failure)
...
[3/145] Generating VCSRevision.h
[4/66] Linking CXX shared module unittests/Support/DynamicLibrary/SecondLib.so
[5/66] Linking CXX shared module unittests/Analysis/InlineOrderPlugin.so
[6/66] Generating VCSVersion.inc
[7/64] Linking CXX shared module unittests/Support/DynamicLibrary/PipSqueak.so
[8/64] Linking CXX shared module unittests/Passes/Plugins/DoublerPlugin.so
[9/64] Linking CXX shared module unittests/Passes/Plugins/TestPlugin.so
[10/64] Linking CXX shared module unittests/Analysis/InlineAdvisorPlugin.so
[11/64] Linking CXX executable bin/llvm-config
[12/64] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o
FAILED: lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o 
CCACHE_CPP2=yes CCACHE_HASHDIR=yes CCACHE_SLOPPINESS=pch_defines,time_macros /usr/bin/ccache /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_GLIBCXX_USE_CXX11_ABI=1 -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm_build0/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm_build0/include -I/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/include -fPIC -fno-semantic-interposition -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wc++98-compat-extra-semi -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wsuggest-override -Wstring-conversion -Wno-pass-failed -Wmisleading-indentation -Wctad-maybe-unsupported -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -DNDEBUG -std=c++17 -fvisibility=hidden  -fno-exceptions -funwind-tables -fno-rtti -MD -MT lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -MF lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o.d -o lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNSchedStrategy.cpp.o -c /home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp
/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.cpp:362:24: error: out-of-line definition of 'printCandidateDecision' does not match any declaration in 'llvm::GCNSchedStrategy'
  362 | void GCNSchedStrategy::printCandidateDecision(const SchedCandidate &Current,
      |                        ^~~~~~~~~~~~~~~~~~~~~~
/home/b/sanitizer-aarch64-linux-fuzzer/build/llvm-project/llvm/lib/Target/AMDGPU/GCNSchedStrategy.h:45:7: note: GCNSchedStrategy defined here
   45 | class GCNSchedStrategy : public GenericScheduler {
      |       ^~~~~~~~~~~~~~~~
1 error generated.
[13/64] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/GCNIterativeScheduler.cpp.o
[14/64] Building CXX object lib/Target/AArch64/CMakeFiles/LLVMAArch64CodeGen.dir/AArch64InstrInfo.cpp.o
[15/64] Building CXX object lib/Target/AMDGPU/CMakeFiles/LLVMAMDGPUCodeGen.dir/AMDGPUTargetMachine.cpp.o
[16/64] Building CXX object lib/Target/RISCV/CMakeFiles/LLVMRISCVCodeGen.dir/RISCVISelLowering.cpp.o
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=42.873133

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants