[AMDGPU][SIInsertWaitCnts] Gfx12.5 - Refactor xcnt optimization #164357

RyanRio · 2025-10-21T04:56:48Z

Refactor the XCnt optimization checks so that they can be checked when applying a pre-existing waitcnt. This removes unnecessary xcnt waits when taking a loop backedge.

github-actions · 2025-10-21T04:57:11Z

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

llvmbot · 2025-10-21T04:58:06Z

@llvm/pr-subscribers-backend-amdgpu

Author: Ryan Mitchell (RyanRio)

Changes

Refactor the XCnt optimization checks so that they can be checked when applying a pre-existing waitcnt. This removes unnecessary xcnt waits when taking a loop backedge.

Patch is 26.44 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/164357.diff

5 Files Affected:

(modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+19-6)
(modified) llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll (+3-4)
(modified) llvm/test/CodeGen/AMDGPU/fmin3.ll (-2)
(modified) llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll (+20-40)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.is.shared.ll (-1)

diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index 6dcbced010a5a..1aeb25248436c 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -646,6 +646,8 @@ class WaitcntBrackets {
 
   void applyWaitcnt(const AMDGPU::Waitcnt &Wait);
   void applyWaitcnt(InstCounterType T, unsigned Count);
+  bool hasRedundantXCntWithKmCnt(const AMDGPU::Waitcnt &Wait);
+  bool canOptimizeXCntWithLoadCnt(const AMDGPU::Waitcnt &Wait);
   void applyXcnt(const AMDGPU::Waitcnt &Wait);
   void updateByEvent(WaitEventType E, MachineInstr &MI);
 
@@ -1287,20 +1289,25 @@ void WaitcntBrackets::applyWaitcnt(InstCounterType T, unsigned Count) {
   }
 }
 
-void WaitcntBrackets::applyXcnt(const AMDGPU::Waitcnt &Wait) {
+bool WaitcntBrackets::hasRedundantXCntWithKmCnt(const AMDGPU::Waitcnt &Wait) {
   // Wait on XCNT is redundant if we are already waiting for a load to complete.
   // SMEM can return out of order, so only omit XCNT wait if we are waiting till
   // zero.
-  if (Wait.KmCnt == 0 && hasPendingEvent(SMEM_GROUP))
-    return applyWaitcnt(X_CNT, 0);
+  return Wait.KmCnt == 0 && hasPendingEvent(SMEM_GROUP);
+}
 
+bool WaitcntBrackets::canOptimizeXCntWithLoadCnt(const AMDGPU::Waitcnt &Wait) {
   // If we have pending store we cannot optimize XCnt because we do not wait for
   // stores. VMEM loads retun in order, so if we only have loads XCnt is
   // decremented to the same number as LOADCnt.
-  if (Wait.LoadCnt != ~0u && hasPendingEvent(VMEM_GROUP) &&
-      !hasPendingEvent(STORE_CNT))
-    return applyWaitcnt(X_CNT, std::min(Wait.XCnt, Wait.LoadCnt));
+  return Wait.LoadCnt != ~0u && hasPendingEvent(VMEM_GROUP) && !hasPendingEvent(STORE_CNT);
+}
 
+void WaitcntBrackets::applyXcnt(const AMDGPU::Waitcnt &Wait) {
+  if (hasRedundantXCntWithKmCnt(Wait))
+    return applyWaitcnt(X_CNT, 0);
+  if (canOptimizeXCntWithLoadCnt(Wait))
+    return applyWaitcnt(X_CNT, std::min(Wait.XCnt, Wait.LoadCnt));
   applyWaitcnt(X_CNT, Wait.XCnt);
 }
 
@@ -1729,6 +1736,12 @@ bool WaitcntGeneratorGFX12Plus::applyPreexistingWaitcnt(
     if (!WaitInstrs[CT])
       continue;
 
+    if ((CT == KM_CNT && ScoreBrackets.hasRedundantXCntWithKmCnt(Wait)) ||
+        (CT == LOAD_CNT && ScoreBrackets.canOptimizeXCntWithLoadCnt(Wait)))
+      // Xcnt may need to be updated depending on a pre-existing KM/LOAD_CNT
+      // due to taking the backedge of a block.
+      ScoreBrackets.applyXcnt(Wait);
+
     unsigned NewCnt = getWait(Wait, CT);
     if (NewCnt != ~0u) {
       Modified |= updateOperandIfDifferent(*WaitInstrs[CT],
diff --git a/llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll b/llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
index b5b2655246c3f..fd1e8807885a5 100644
--- a/llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
+++ b/llvm/test/CodeGen/AMDGPU/flat-saddr-load.ll
@@ -2115,7 +2115,7 @@ define amdgpu_ps void @flat_addr_64bit_lsr_iv(ptr inreg %arg) {
 ; GFX1250-SDAG-NEXT:    s_mov_b64 s[0:1], 0
 ; GFX1250-SDAG-NEXT:  .LBB116_1: ; %bb3
 ; GFX1250-SDAG-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX1250-SDAG-NEXT:    s_wait_xcnt 0x0
+; GFX1250-SDAG-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-SDAG-NEXT:    s_add_nc_u64 s[4:5], s[2:3], s[0:1]
 ; GFX1250-SDAG-NEXT:    s_add_nc_u64 s[0:1], s[0:1], 4
 ; GFX1250-SDAG-NEXT:    s_wait_dscnt 0x0
@@ -2134,7 +2134,6 @@ define amdgpu_ps void @flat_addr_64bit_lsr_iv(ptr inreg %arg) {
 ; GFX1250-GISEL-NEXT:  .LBB116_1: ; %bb3
 ; GFX1250-GISEL-NEXT:    ; =>This Inner Loop Header: Depth=1
 ; GFX1250-GISEL-NEXT:    s_wait_dscnt 0x0
-; GFX1250-GISEL-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
 ; GFX1250-GISEL-NEXT:    v_add_co_u32 v4, vcc_lo, v0, v2
 ; GFX1250-GISEL-NEXT:    v_add_co_ci_u32_e64 v5, null, v1, v3, vcc_lo
@@ -2170,7 +2169,7 @@ define amdgpu_ps void @flat_addr_64bit_lsr_iv_multiload(ptr inreg %arg, ptr inre
 ; GFX1250-SDAG-NEXT:    s_mov_b64 s[0:1], 0
 ; GFX1250-SDAG-NEXT:  .LBB117_1: ; %bb3
 ; GFX1250-SDAG-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX1250-SDAG-NEXT:    s_wait_xcnt 0x0
+; GFX1250-SDAG-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-SDAG-NEXT:    s_add_nc_u64 s[4:5], s[2:3], s[0:1]
 ; GFX1250-SDAG-NEXT:    s_add_nc_u64 s[0:1], s[0:1], 4
 ; GFX1250-SDAG-NEXT:    s_wait_dscnt 0x0
@@ -2191,7 +2190,7 @@ define amdgpu_ps void @flat_addr_64bit_lsr_iv_multiload(ptr inreg %arg, ptr inre
 ; GFX1250-GISEL-NEXT:    v_mov_b64_e32 v[2:3], s[0:1]
 ; GFX1250-GISEL-NEXT:  .LBB117_1: ; %bb3
 ; GFX1250-GISEL-NEXT:    ; =>This Inner Loop Header: Depth=1
-; GFX1250-GISEL-NEXT:    s_wait_xcnt 0x0
+; GFX1250-GISEL-NEXT:    s_wait_dscnt 0x0
 ; GFX1250-GISEL-NEXT:    s_delay_alu instid0(VALU_DEP_1) | instskip(NEXT) | instid1(VALU_DEP_1)
 ; GFX1250-GISEL-NEXT:    v_add_co_u32 v4, vcc_lo, v0, v2
 ; GFX1250-GISEL-NEXT:    v_add_co_ci_u32_e64 v5, null, v1, v3, vcc_lo
diff --git a/llvm/test/CodeGen/AMDGPU/fmin3.ll b/llvm/test/CodeGen/AMDGPU/fmin3.ll
index 6a6f232c55e24..2756472652bc9 100644
--- a/llvm/test/CodeGen/AMDGPU/fmin3.ll
+++ b/llvm/test/CodeGen/AMDGPU/fmin3.ll
@@ -1233,7 +1233,6 @@ define amdgpu_kernel void @test_fmin3_olt_0_f64(ptr addrspace(1) %out, ptr addrs
 ; GFX1250-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-NEXT:    buffer_load_b64 v[2:3], off, s[16:19], null scope:SCOPE_SYS
 ; GFX1250-NEXT:    s_wait_loadcnt 0x0
-; GFX1250-NEXT:    s_wait_xcnt 0x1
 ; GFX1250-NEXT:    s_mov_b32 s4, s14
 ; GFX1250-NEXT:    s_mov_b32 s5, s15
 ; GFX1250-NEXT:    s_mov_b32 s0, s8
@@ -1443,7 +1442,6 @@ define amdgpu_kernel void @test_fmin3_olt_1_f64(ptr addrspace(1) %out, ptr addrs
 ; GFX1250-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-NEXT:    buffer_load_b64 v[2:3], off, s[16:19], null scope:SCOPE_SYS
 ; GFX1250-NEXT:    s_wait_loadcnt 0x0
-; GFX1250-NEXT:    s_wait_xcnt 0x1
 ; GFX1250-NEXT:    s_mov_b32 s4, s14
 ; GFX1250-NEXT:    s_mov_b32 s5, s15
 ; GFX1250-NEXT:    s_mov_b32 s0, s8
diff --git a/llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll b/llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll
index a50791e10f5a2..ed565ca43f9a3 100644
--- a/llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll
+++ b/llvm/test/CodeGen/AMDGPU/global-atomicrmw-fadd.ll
@@ -8814,7 +8814,7 @@ define half @global_agent_atomic_fadd_ret_f16__amdgpu_no_fine_grained_memory(ptr
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB44_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -8857,7 +8857,7 @@ define half @global_agent_atomic_fadd_ret_f16__amdgpu_no_fine_grained_memory(ptr
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB44_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -9322,7 +9322,7 @@ define half @global_agent_atomic_fadd_ret_f16__offset12b_pos__amdgpu_no_fine_gra
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB45_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -9365,7 +9365,7 @@ define half @global_agent_atomic_fadd_ret_f16__offset12b_pos__amdgpu_no_fine_gra
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB45_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -9844,7 +9844,7 @@ define half @global_agent_atomic_fadd_ret_f16__offset12b_neg__amdgpu_no_fine_gra
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB46_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -9888,7 +9888,7 @@ define half @global_agent_atomic_fadd_ret_f16__offset12b_neg__amdgpu_no_fine_gra
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB46_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -10365,7 +10365,6 @@ define void @global_agent_atomic_fadd_noret_f16__amdgpu_no_fine_grained_memory(p
 ; GFX1250-TRUE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-TRUE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -10407,7 +10406,6 @@ define void @global_agent_atomic_fadd_noret_f16__amdgpu_no_fine_grained_memory(p
 ; GFX1250-FAKE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-FAKE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -10857,7 +10855,6 @@ define void @global_agent_atomic_fadd_noret_f16__offset12b_pos__amdgpu_no_fine_g
 ; GFX1250-TRUE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-TRUE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -10899,7 +10896,6 @@ define void @global_agent_atomic_fadd_noret_f16__offset12b_pos__amdgpu_no_fine_g
 ; GFX1250-FAKE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-FAKE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -11363,7 +11359,6 @@ define void @global_agent_atomic_fadd_noret_f16__offset12b_neg__amdgpu_no_fine_g
 ; GFX1250-TRUE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-TRUE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -11406,7 +11401,6 @@ define void @global_agent_atomic_fadd_noret_f16__offset12b_neg__amdgpu_no_fine_g
 ; GFX1250-FAKE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-FAKE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -11861,7 +11855,7 @@ define half @global_agent_atomic_fadd_ret_f16__offset12b_pos__align4__amdgpu_no_
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v3, v5
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB50_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -11893,7 +11887,7 @@ define half @global_agent_atomic_fadd_ret_f16__offset12b_pos__align4__amdgpu_no_
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v3, v5
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB50_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -12245,7 +12239,6 @@ define void @global_agent_atomic_fadd_noret_f16__offset12b__align4_pos__amdgpu_n
 ; GFX1250-TRUE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v3, v5
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-TRUE16-NEXT:    v_mov_b32_e32 v5, v3
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -12276,7 +12269,6 @@ define void @global_agent_atomic_fadd_noret_f16__offset12b__align4_pos__amdgpu_n
 ; GFX1250-FAKE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v3, v5
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-FAKE16-NEXT:    v_mov_b32_e32 v5, v3
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -12631,7 +12623,7 @@ define half @global_system_atomic_fadd_ret_f16__offset12b_pos__amdgpu_no_fine_gr
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_SYS
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB52_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -12674,7 +12666,7 @@ define half @global_system_atomic_fadd_ret_f16__offset12b_pos__amdgpu_no_fine_gr
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_SYS
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB52_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -13154,7 +13146,6 @@ define void @global_system_atomic_fadd_noret_f16__offset12b_pos__amdgpu_no_fine_
 ; GFX1250-TRUE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_SYS
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-TRUE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -13196,7 +13187,6 @@ define void @global_system_atomic_fadd_noret_f16__offset12b_pos__amdgpu_no_fine_
 ; GFX1250-FAKE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_SYS
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-FAKE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -13676,7 +13666,7 @@ define bfloat @global_agent_atomic_fadd_ret_bf16__amdgpu_no_fine_grained_memory(
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB54_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -13722,7 +13712,7 @@ define bfloat @global_agent_atomic_fadd_ret_bf16__amdgpu_no_fine_grained_memory(
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB54_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -14273,7 +14263,7 @@ define bfloat @global_agent_atomic_fadd_ret_bf16__offset12b_pos__amdgpu_no_fine_
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB55_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -14319,7 +14309,7 @@ define bfloat @global_agent_atomic_fadd_ret_bf16__offset12b_pos__amdgpu_no_fine_
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB55_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -14888,7 +14878,7 @@ define bfloat @global_agent_atomic_fadd_ret_bf16__offset12b_neg__amdgpu_no_fine_
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-TRUE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_cbranch_execnz .LBB56_1
 ; GFX1250-TRUE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -14936,7 +14926,7 @@ define bfloat @global_agent_atomic_fadd_ret_bf16__offset12b_neg__amdgpu_no_fine_
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v5, v7
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
+; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
 ; GFX1250-FAKE16-NEXT:    s_and_not1_b32 exec_lo, exec_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_cbranch_execnz .LBB56_1
 ; GFX1250-FAKE16-NEXT:  ; %bb.2: ; %atomicrmw.end
@@ -15502,7 +15492,6 @@ define void @global_agent_atomic_fadd_noret_bf16__amdgpu_no_fine_grained_memory(
 ; GFX1250-TRUE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-TRUE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-TRUE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-TRUE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-TRUE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-TRUE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-TRUE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -15547,7 +15536,6 @@ define void @global_agent_atomic_fadd_noret_bf16__amdgpu_no_fine_grained_memory(
 ; GFX1250-FAKE16-NEXT:    s_wait_loadcnt 0x0
 ; GFX1250-FAKE16-NEXT:    global_inv scope:SCOPE_DEV
 ; GFX1250-FAKE16-NEXT:    v_cmp_eq_u32_e32 vcc_lo, v4, v5
-; GFX1250-FAKE16-NEXT:    s_wait_xcnt 0x0
 ; GFX1250-FAKE16-NEXT:    v_mov_b32_e32 v5, v4
 ; GFX1250-FAKE16-NEXT:    s_or_b32 s0, vcc_lo, s0
 ; GFX1250-FAKE16-NEXT:    s_delay_alu instid0(SALU_CYCLE_1)
@@ -16...
[truncated]

nhaehnle

Seems reasonable to me.

RyanRio · 2025-11-03T17:43:56Z

@jayfoad @easyonaadit Since my PR also makes xcnt changes, it would be great if we could collaborate on the required changes from #166154 #165201 here. I've pushed what I think is a fix, although more conservative in the LoadCnt case.

The issue your test points out, Jay, is that xcnt can only be optimized to the min of loadcnt/xcnt, and so the event can't just be fully cleared unless it's 0, correct?

Edit: I've added the test, all the prior tests remain the same and Jay's is now fixed. So I think mine is correct.

github-actions · 2025-11-03T21:52:16Z

✅ With the latest revision this PR passed the C/C++ code formatter.

shiltian

some format nits for code style consistency

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

RyanRio · 2025-11-03T22:10:28Z

@shiltian Addressed your nits, feel free to re-trigger the workflows, thanks.

RyanRio · 2025-11-04T00:17:37Z

@jayfoad Feel free to merge if you agree with my changes.

llvm/test/CodeGen/AMDGPU/wait-xcnt.mir

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

jayfoad · 2025-11-04T11:52:56Z

Refactor the XCnt optimization checks so that they can be checked when applying a pre-existing waitcnt.

Hi @RyanRio, it looks to me like all the optimizations done in applyXcnt (called from applyWaitcnt) should instead be done in simplifyWaitcnt, which is already called on both paths (generating new waitcnts and applying pre-existing waitcnts). Can you try that please?

(There are also some Xcnt optimizations in generateWaitcnt that I don't fully understand, but perhaps they could also be subsumed by simplifyWaitcnt?)

RyanRio · 2025-11-04T19:19:10Z

@jayfoad retrigger and merge if acceptable, please.

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

RyanRio · 2025-11-06T14:56:08Z

Is an xcnt necessary here?

; GFX1250-LABEL: flat_atomic_fadd_f32_noret_pat:
; GFX1250:       ; %bb.0:
; GFX1250-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
; GFX1250-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 4.0
; GFX1250-NEXT:    global_wb scope:SCOPE_SYS
; GFX1250-NEXT:    s_wait_storecnt 0x0
; GFX1250-NEXT:    s_wait_xcnt 0x0
; GFX1250-NEXT:    s_wait_kmcnt 0x0
; GFX1250-NEXT:    flat_atomic_add_f32 v0, v1, s[0:1] scope:SCOPE_SYS
; GFX1250-NEXT:    s_wait_storecnt_dscnt 0x0
; GFX1250-NEXT:    global_inv scope:SCOPE_SYS
; GFX1250-NEXT:    s_endpgm

jayfoad · 2025-11-06T15:12:42Z

Is an xcnt necessary here?

; GFX1250-LABEL: flat_atomic_fadd_f32_noret_pat:
; GFX1250:       ; %bb.0:
; GFX1250-NEXT:    s_load_b64 s[0:1], s[4:5], 0x24
; GFX1250-NEXT:    v_dual_mov_b32 v0, 0 :: v_dual_mov_b32 v1, 4.0
; GFX1250-NEXT:    global_wb scope:SCOPE_SYS
; GFX1250-NEXT:    s_wait_storecnt 0x0
; GFX1250-NEXT:    s_wait_xcnt 0x0
; GFX1250-NEXT:    s_wait_kmcnt 0x0
; GFX1250-NEXT:    flat_atomic_add_f32 v0, v1, s[0:1] scope:SCOPE_SYS
; GFX1250-NEXT:    s_wait_storecnt_dscnt 0x0
; GFX1250-NEXT:    global_inv scope:SCOPE_SYS
; GFX1250-NEXT:    s_endpgm

Dunno! :)

I assume that global_wb does not increment xcnt since it has no address to translate. So the only pending translation would be for the s_load_b64, so I think you would only need an s_wait_xcnt if you are about to clobber s[4:5], which we're not. And also the s_wait_kmcnt would make it redundant. But maybe I'm missing something.

Refactor the XCnt optimization checks so that they can be checked when applying a pre-existing waitcnt. This has the effect of removing unnecessary xcnt waits when taking a loop backedge.

This reverts commit d3ed9c9.

RyanRio · 2025-11-10T16:39:56Z

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

-  // XCnt may be already consumed by a load wait.
-  if (Wait.XCnt != ~0u) {
-    if (Wait.KmCnt == 0 && !ScoreBrackets.hasPendingEvent(SMEM_GROUP))
-      Wait.XCnt = ~0u;
-
-    if (Wait.LoadCnt == 0 && !ScoreBrackets.hasPendingEvent(VMEM_GROUP))
-      Wait.XCnt = ~0u;
-
-    // Since the translation for VMEM addresses occur in-order, we can skip the
-    // XCnt if the current instruction is of VMEM type and has a memory
-    // dependency with another VMEM instruction in flight.
-    if (isVmemAccess(*It))
-      Wait.XCnt = ~0u;
-  }
-


@nhaehnle @jayfoad Thoughts on this case (see changes in bf16.ll)? -

; GFX1250-LABEL: test_load_store_f32_to_bf16: ; GFX1250: ; %bb.0: ; GFX1250-NEXT: s_wait_loadcnt_dscnt 0x0 ; GFX1250-NEXT: s_wait_kmcnt 0x0 ; GFX1250-NEXT: global_load_b32 v0, v[0:1], off Debug output before the next instruction: LOAD_CNT(1): 0:v0 0:v1 DS_CNT(0): EXP_CNT(0): STORE_CNT(63): SAMPLE_CNT(0): BVH_CNT(0): KM_CNT(0): X_CNT(1): 0:v0 0:v1 0:v2 0:v3 0:s126 0:s127 Pending Events: VMEM_READ_ACCESS, VMEM_WRITE_ACCESS, SCRATCH_WRITE_ACCESS, VMEM_GROUP ; GFX1250-NEXT: s_wait_loadcnt 0x0 ; GFX1250-NEXT: s_wait_xcnt 0x0 ; GFX1250-NEXT: v_cvt_pk_bf16_f32 v0, v0, s0 ; GFX1250-NEXT: global_store_b16 v[2:3], v0, off ; GFX1250-NEXT: s_set_pc_i64 s[30:31] %val = load float, ptr addrspace(1) %in %val.bf16 = fptrunc float %val to bfloat store bfloat %val.bf16, ptr addrspace(1) %out ret void }

Previously the xcnt was not waited on because of the check on L2168, but now it also requires !hasPendingEvent(STORE_CNT). I think my change is correct because only VMEM loads return in order.

This is waiting for #166779 though (currently just includes Jay's fix in this PR.)

We could also just have this one supersede it, up to you Jay.

Apparently there was a late change to gfx1250 that made it so every s_wait_loadcnt (and storecnt) also waits for the equivalent xcnt value. We need to get some clarity on that internally.

Interesting. I also see that hardware waits for xcnt==0 at every branch/call, which probably means that SIInsertWaitcnts does not after all have to handle a mixture of pending SMEM Xcnts and VMEM Xcnts.

Added a todo to revisit.

github-actions · 2025-11-13T18:43:32Z

@RyanRio Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

llvmbot added the backend:AMDGPU label Oct 21, 2025

RyanRio force-pushed the users/rmitchel/update_insertwaitcnt_xcnt branch from 8ad5479 to 921c950 Compare October 30, 2025 18:43

nhaehnle approved these changes Oct 30, 2025

View reviewed changes

RyanRio force-pushed the users/rmitchel/update_insertwaitcnt_xcnt branch from 921c950 to 0f781b6 Compare November 3, 2025 17:38

shiltian reviewed Nov 3, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp Outdated Show resolved Hide resolved

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp Outdated Show resolved Hide resolved

easyonaadit reviewed Nov 4, 2025

View reviewed changes

llvm/test/CodeGen/AMDGPU/wait-xcnt.mir Show resolved Hide resolved

easyonaadit reviewed Nov 4, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp Outdated Show resolved Hide resolved

RyanRio commented Nov 4, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp Outdated Show resolved Hide resolved

RyanRio commented Nov 6, 2025

View reviewed changes

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp Show resolved Hide resolved

RyanRio added 9 commits November 9, 2025 22:11

[AMDGPU][SIInsertWaitCnts] Refactor xcnt optimization

4df30af

Refactor the XCnt optimization checks so that they can be checked when applying a pre-existing waitcnt. This has the effect of removing unnecessary xcnt waits when taking a loop backedge.

[AMDGPU][SIInsertWaitCnts] Add mixed pending event xcnt test

0ac18c1

[AMDGPU][SIInsertWaitCnts] format

d87068d

[AMDGPU][SIInsertWaitCnts] format braces

cca7a3e

[AMDGPU][SIInsertWaitCnts] use simplifyWaitcnt code path

44466c2

[AMDGPU][SIInsertWaitCnts] move vmemaccess check

4770a1e

Revert "[AMDGPU][SIInsertWaitCnts] move vmemaccess check"

64fb73e

This reverts commit d3ed9c9.

[AMDGPU][SIInsertWaitCnts] move vmem access check and apply waitcnt

2683aa7

[AMDGPU][SIInsertWaitCnts] format

a1ee5b8

RyanRio force-pushed the users/rmitchel/update_insertwaitcnt_xcnt branch from f4a5085 to a1ee5b8 Compare November 10, 2025 06:12

[AMDGPU][SIInsertWaitCnts] test changes, use jayfoad patch

7708bb6

RyanRio commented Nov 10, 2025

View reviewed changes

RyanRio added 2 commits November 13, 2025 10:07

[AMDGPU][SIInsertWaitCnts] Add todos

02bfb8c

Merge branch 'main' into users/rmitchel/update_insertwaitcnt_xcnt

935fba6

nhaehnle approved these changes Nov 13, 2025

View reviewed changes

nhaehnle enabled auto-merge (squash) November 13, 2025 18:18

nhaehnle merged commit 5e4505d into llvm:main Nov 13, 2025
9 of 10 checks passed

[AMDGPU][SIInsertWaitCnts] Gfx12.5 - Refactor xcnt optimization #164357

[AMDGPU][SIInsertWaitCnts] Gfx12.5 - Refactor xcnt optimization #164357

Uh oh!

Conversation

RyanRio commented Oct 21, 2025

Uh oh!

github-actions bot commented Oct 21, 2025

Uh oh!

llvmbot commented Oct 21, 2025

Uh oh!

nhaehnle left a comment

Choose a reason for hiding this comment

Uh oh!

RyanRio commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

shiltian left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RyanRio commented Nov 3, 2025

Uh oh!

RyanRio commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

jayfoad commented Nov 4, 2025

Uh oh!

RyanRio commented Nov 4, 2025

Uh oh!

Uh oh!

Uh oh!

RyanRio commented Nov 6, 2025

Uh oh!

jayfoad commented Nov 6, 2025

Uh oh!

RyanRio Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

RyanRio Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RyanRio Nov 10, 2025

Choose a reason for hiding this comment

Uh oh!

nhaehnle Nov 11, 2025

Choose a reason for hiding this comment

Uh oh!

jayfoad Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

RyanRio Nov 13, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

github-actions bot commented Nov 13, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

RyanRio commented Nov 3, 2025 •

edited

Loading

github-actions bot commented Nov 3, 2025 •

edited

Loading

RyanRio Nov 10, 2025 •

edited

Loading