Skip to content

Conversation

@jayfoad
Copy link
Contributor

@jayfoad jayfoad commented Nov 6, 2025

No description provided.

Using different offsets for the two global loads just makes it more
obvious that the second one could fail address translation even if the
first one succeeds.
This is just stylistic. It allows using the identical condition in the
SMEM and VMEM cases and potentially guards against more X_CNT event
types being added in future.
@llvmbot
Copy link
Member

llvmbot commented Nov 6, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: Jay Foad (jayfoad)

Changes

Full diff: https://github.com/llvm/llvm-project/pull/166779.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+6-12)
  • (modified) llvm/test/CodeGen/AMDGPU/wait-xcnt.mir (+5-5)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index b7fa899678ec7..306d59d0867cd 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1291,21 +1291,15 @@ void WaitcntBrackets::applyXcnt(const AMDGPU::Waitcnt &Wait) {
   // On entry to a block with multiple predescessors, there may
   // be pending SMEM and VMEM events active at the same time.
   // In such cases, only clear one active event at a time.
-  auto applyPendingXcntGroup = [this](unsigned E) {
-    unsigned LowerBound = getScoreLB(X_CNT);
-    applyWaitcnt(X_CNT, 0);
-    PendingEvents |= (1 << E);
-    setScoreLB(X_CNT, LowerBound);
-  };
 
   // Wait on XCNT is redundant if we are already waiting for a load to complete.
   // SMEM can return out of order, so only omit XCNT wait if we are waiting till
   // zero.
   if (Wait.KmCnt == 0 && hasPendingEvent(SMEM_GROUP)) {
-    if (hasPendingEvent(VMEM_GROUP))
-      applyPendingXcntGroup(VMEM_GROUP);
-    else
+    if (!hasMixedPendingEvents(X_CNT))
       applyWaitcnt(X_CNT, 0);
+    else
+      PendingEvents &= ~(1 << SMEM_GROUP);
     return;
   }
 
@@ -1314,10 +1308,10 @@ void WaitcntBrackets::applyXcnt(const AMDGPU::Waitcnt &Wait) {
   // decremented to the same number as LOADCnt.
   if (Wait.LoadCnt != ~0u && hasPendingEvent(VMEM_GROUP) &&
       !hasPendingEvent(STORE_CNT)) {
-    if (hasPendingEvent(SMEM_GROUP))
-      applyPendingXcntGroup(SMEM_GROUP);
-    else
+    if (!hasMixedPendingEvents(X_CNT))
       applyWaitcnt(X_CNT, std::min(Wait.XCnt, Wait.LoadCnt));
+    else if (Wait.LoadCnt == 0)
+      PendingEvents &= ~(1 << VMEM_GROUP);
     return;
   }
 
diff --git a/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir b/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
index f964480dcc633..fe16f0d44dd1c 100644
--- a/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
+++ b/llvm/test/CodeGen/AMDGPU/wait-xcnt.mir
@@ -1069,7 +1069,6 @@ body: |
     $sgpr0 = S_MOV_B32 $sgpr0
 ...
 
-# FIXME: Missing S_WAIT_XCNT before overwriting vgpr0.
 ---
 name: mixed_pending_events
 tracksRegLiveness: true
@@ -1088,8 +1087,8 @@ body: |
   ; GCN-NEXT:   successors: %bb.2(0x80000000)
   ; GCN-NEXT:   liveins: $vgpr0_vgpr1, $sgpr2
   ; GCN-NEXT: {{  $}}
-  ; GCN-NEXT:   $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
-  ; GCN-NEXT:   $vgpr3 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+  ; GCN-NEXT:   $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 100, 0, implicit $exec
+  ; GCN-NEXT:   $vgpr3 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 200, 0, implicit $exec
   ; GCN-NEXT: {{  $}}
   ; GCN-NEXT: bb.2:
   ; GCN-NEXT:   liveins: $sgpr2, $vgpr2
@@ -1098,6 +1097,7 @@ body: |
   ; GCN-NEXT:   $vgpr2 = V_MOV_B32_e32 $vgpr2, implicit $exec
   ; GCN-NEXT:   S_WAIT_KMCNT 0
   ; GCN-NEXT:   $sgpr2 = S_MOV_B32 $sgpr2
+  ; GCN-NEXT:   S_WAIT_XCNT 0
   ; GCN-NEXT:   $vgpr0 = V_MOV_B32_e32 0, implicit $exec
   bb.0:
     liveins: $vgpr0_vgpr1, $sgpr0_sgpr1, $scc
@@ -1105,8 +1105,8 @@ body: |
     S_CBRANCH_SCC1 %bb.2, implicit $scc
   bb.1:
     liveins: $vgpr0_vgpr1, $sgpr2
-    $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
-    $vgpr3 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 0, 0, implicit $exec
+    $vgpr2 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 100, 0, implicit $exec
+    $vgpr3 = GLOBAL_LOAD_DWORD $vgpr0_vgpr1, 200, 0, implicit $exec
   bb.2:
     liveins: $sgpr2, $vgpr2
     $vgpr2 = V_MOV_B32_e32 $vgpr2, implicit $exec

@jayfoad
Copy link
Contributor Author

jayfoad commented Nov 6, 2025

@RyanRio FYI

@jayfoad
Copy link
Contributor Author

jayfoad commented Nov 6, 2025

Split into multiple commits for ease of review and to make the final bug-fixing commit as small as possible.

@jayfoad
Copy link
Contributor Author

jayfoad commented Nov 11, 2025

Eager ping - I tried hard to make this easy to review. The topmost commit should be an obvious bug fix, only clear pending VMEM events if LoadCnt == 0. The other commits are all NFC refactorings leading up to the fix.

Copy link
Collaborator

@nhaehnle nhaehnle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me.

@jayfoad jayfoad merged commit 5e4f177 into llvm:main Nov 12, 2025
12 checks passed
@jayfoad jayfoad deleted the fix-missing-xcnt branch November 12, 2025 09:44
@llvm-ci
Copy link
Collaborator

llvm-ci commented Nov 12, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-aarch64-darwin running on doug-worker-5 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/190/builds/30802

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll' FAILED ********************
Exit Code: 2

Command Output (stdout):
--
# RUN: at line 1
/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll | /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# executed command: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# .---command stderr------------
# | PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace and instructions to reproduce the bug.
# | Stack dump:
# | 0.	Program arguments: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli -jit-kind=orc-lazy -compile-threads=2 -thread-entry hello /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# |  #0 0x00000001050c4f48 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f38f48)
# |  #1 0x00000001050c2cf8 llvm::sys::RunSignalHandlers() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f36cf8)
# |  #2 0x00000001050c5a48 SignalHandler(int, __siginfo*, void*) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100f39a48)
# |  #3 0x00000001860c3584 (/usr/lib/system/libsystem_platform.dylib+0x18047b584)
# |  #4 0x0000010104c177e0
# |  #5 0x0000000104c2153c llvm::orc::ExecutionSession::removeJITDylibs(std::__1::vector<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>, std::__1::allocator<llvm::IntrusiveRefCntPtr<llvm::orc::JITDylib>>>) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100a9553c)
# |  #6 0x0000000104c212ec llvm::orc::ExecutionSession::endSession() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100a952ec)
# |  #7 0x0000000104cadf5c llvm::orc::LLJIT::~LLJIT() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100b21f5c)
# |  #8 0x0000000104cb28e8 llvm::orc::LLLazyJIT::~LLLazyJIT() (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100b268e8)
# |  #9 0x0000000104195134 runOrcJIT(char const*) (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100009134)
# | #10 0x0000000104190620 main (/Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/lli+0x100004620)
# | #11 0x0000000185d07154
# `-----------------------------
# error: command failed with exit status: -11
# executed command: /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# .---command stderr------------
# | FileCheck error: '<stdin>' is empty.
# | FileCheck command line:  /Volumes/ExternalSSD/buildbot-root/aarch64-darwin/build/bin/FileCheck /Users/buildbot/buildbot-root2/aarch64-darwin/llvm-project/llvm/test/ExecutionEngine/OrcLazy/multiple-compile-threads-basic.ll
# `-----------------------------
# error: command failed with exit status: 2

--

********************


git-crd pushed a commit to git-crd/crd-llvm-project that referenced this pull request Nov 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants