Skip to content

Conversation

choikwa
Copy link
Contributor

@choikwa choikwa commented Sep 10, 2025

This change was motivated by CK where many VMCNT(0)'s were generated due to instructions lacking !alias.scope metadata. The two causes of this were:

  1. LowerLDSModule not tacking on scope metadata on a single LDS variable
  2. IPSCCP pass before inliner replacing noalias ptr derivative with a
    global value, which made inliner unable to track it back to the noalias
    ptr argument.

However, it turns out that IPSCCP losing the scope information was largely ineffectual as ScopedNoAliasAA was able to handle asymmetric condition, where one MemLoc was missing scope, and still return NoAlias result.

AMDGPU however was checking for existence of scope in SIInsertWaitcnts and conservatively treating it as aliasing all and inserted VMCNT(0) before DS_READs, forcing it to wait for all previous LDS DMA instructions.

Since we know that ScopedNoAliasAA can handle asymmetry, we should also allow AA query to determine if two MIs may alias.

Passed PSDB.

Previous attempt to address the issue in IPSCCP, likely stalled: #154522
This solution may be preferrable over that as issue only affects AMDGPU.

@llvmbot
Copy link
Member

llvmbot commented Sep 10, 2025

@llvm/pr-subscribers-backend-amdgpu

Author: choikwa (choikwa)

Changes

This change was motivated by CK where many VMCNT(0)'s were generated due to instructions lacking !alias.scope metadata. The two causes of this were:

  1. LowerLDSModule not tacking on scope metadata on a single LDS variable
  2. IPSCCP pass before inliner replacing noalias ptr derivative with a
    global value, which made inliner unable to track it back to the noalias
    ptr argument.

However, it turns out that IPSCCP losing the scope information was largely ineffectual as ScopedNoAliasAA was able to handle asymmetric condition, where one MemLoc was missing scope, and still return NoAlias result.

AMDGPU however was checking for existence of scope in SIInsertWaitcnts and conservatively treating it as aliasing all and inserted VMCNT(0) before DS_READs, forcing it to wait for all previous LDS DMA instructions.

Since we know that ScopedNoAliasAA can handle asymmetry, we should also allow AA query to determine if two MIs may alias.

Passed PSDB.

Previous attempt to address the issue in IPSCCP, likely stalled: #154522
This solution may be preferrable over that as issue only affects AMDGPU.


Full diff: https://github.com/llvm/llvm-project/pull/157821.diff

2 Files Affected:

  • (modified) llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp (+4-6)
  • (added) llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll (+65)
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index e3a2efdd3856f..68e7bdc2b5ca8 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -1943,12 +1943,10 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
         // LOAD_CNT is only relevant to vgpr or LDS.
         unsigned RegNo = FIRST_LDS_VGPR;
         // Only objects with alias scope info were added to LDSDMAScopes array.
-        // In the absense of the scope info we will not be able to disambiguate
-        // aliasing here. There is no need to try searching for a corresponding
-        // store slot. This is conservatively correct because in that case we
-        // will produce a wait using the first (general) LDS DMA wait slot which
-        // will wait on all of them anyway.
-        if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {
+        // AliasAnalysis query can determine aliasing even if Memop's Scope is
+        // missing. ScopedNoAlias allows for alias query on MemLoc without a
+        // scope.
+        if (Ptr && Memop->getAAInfo()) {
           const auto &LDSDMAStores = ScoreBrackets.getLDSDMAStores();
           for (unsigned I = 0, E = LDSDMAStores.size(); I != E; ++I) {
             if (MI.mayAlias(AA, *LDSDMAStores[I], true))
diff --git a/llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll b/llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll
new file mode 100644
index 0000000000000..6d24de85a8ad8
--- /dev/null
+++ b/llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll
@@ -0,0 +1,65 @@
+; RUN: llc -mtriple=amdgcn--amdhsa -mcpu=gfx950 < %s | FileCheck -check-prefix=CHECK %s
+
+declare void @llvm.amdgcn.sched.barrier(i32 %mask) #0
+declare void @llvm.amdgcn.load.to.lds(ptr %in, ptr addrspace(3) %lds_out, i32 %size, i32 %offset, i32 %aux) #0
+
+define amdgpu_kernel void @test_waitcnt(ptr addrspace(1) %global_buffer, ptr addrspace(3) %lds_buffer1, ptr addrspace(3) %lds_buffer2) {
+; This test checks if SIInsertWaitcnts pass inserts S_WAITCNT VMCNT(0) before DS_READ
+; CHECK-NOT: s_waitcnt vmcnt(0)
+; CHECK: ds_read_b32
+entry:
+  ; VMEM accesses with alias.scope
+  %vmem_load = load i32, ptr addrspace(1) %global_buffer
+  %gepvmem = getelementptr i32, ptr addrspace(1) %global_buffer, i32 16
+  store i32 %vmem_load, ptr addrspace(1) %gepvmem, align 4, !alias.scope !0
+
+  ; Global to LDS load
+  %gepvmem.ascast = addrspacecast ptr addrspace(1) %gepvmem to ptr
+  call void @llvm.amdgcn.load.to.lds(ptr %gepvmem.ascast, ptr addrspace(3) %lds_buffer1, i32 4, i32 4, i32 0), !alias.scope !9, !noalias !14
+  
+  ; Insert scheduling barrier
+  call void @llvm.amdgcn.sched.barrier(i32 0)
+
+  ; DS_WRITEs with alias.scope and noalias
+  store i32 %vmem_load, ptr addrspace(3) %lds_buffer1, align 4, !alias.scope !1, !noalias !12
+  store i32 %vmem_load, ptr addrspace(3) %lds_buffer2, align 4, !alias.scope !6, !noalias !13
+
+  ; Insert scheduling barrier
+  call void @llvm.amdgcn.sched.barrier(i32 0)
+  
+  ; DS_READ with alias.scope missing
+  %lds_load = load i32, ptr addrspace(3) %lds_buffer1, align 4, !noalias !12
+
+  ; VMEM write
+  %gep = getelementptr i32, ptr addrspace(1) %global_buffer, i32 4
+  %gep2 = getelementptr i32, ptr addrspace(1) %global_buffer, i32 8
+  store i32 %lds_load, ptr addrspace(1) %gep, align 4, !alias.scope !0
+  store i32 %vmem_load, ptr addrspace(1) %gep2, align 4, !alias.scope !0
+
+  ret void
+}
+
+; VMEM alias domain and scope
+!5 = !{!"vmem.domain"}
+!4 = !{!"vmem.scope", !5}
+!0 = !{!4}
+
+; LDS alias domains and scopes
+!3 = !{!"lds1.domain"}
+!2 = !{!"lds1.scope", !3}
+!1 = !{!2}
+
+!8 = !{!"lds2.domain"}
+!7 = !{!"lds2.scope", !8}
+!6 = !{!7}
+
+!11 = !{!"lds1_off4.domain"}
+!10 = !{!"lds1_off4.scope", !11}
+!9 = !{!10}
+
+; Noalias lists
+!12 = !{!7, !10}
+!13 = !{!2, !10}
+!14 = !{!2, !7}
+
+attributes #0 = { nounwind }

@arsenm arsenm requested a review from Pierre-vh September 10, 2025 09:56
Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please pre-commit the test as NFC so we can see the changes?

choikwa added a commit to choikwa/llvm-project that referenced this pull request Sep 10, 2025
choikwa added a commit that referenced this pull request Sep 10, 2025
…efore

This change was motivated by CK where many VMCNT(0)'s were generated due
to instructions lacking !alias.scope metadata. The two causes of this were
1) LowerLDSModule not tacking on scope metadata on a single LDS variable
2) IPSCCP pass before inliner replacing noalias ptr derivative with a
   global value, which made inliner unable to track it back to the noalias
   ptr argument.

However, it turns out that IPSCCP losing the scope information was largely
ineffectual as ScopedNoAliasAA was able to handle asymmetric condition,
where one MemLoc was missing scope, and still return NoAlias result.

AMDGPU however was checking for existence of scope in SIInsertWaitcnts and
conservatively treating it as aliasing all and inserted VMCNT(0) before
DS_READs, forcing it to wait for all previous LDS DMA instructions.

Since we know that ScopedNoAliasAA can handle asymmetry, we should also
allow AA query to determine if two MIs may alias.

Remove confusing comments, redundant check-prefix, move function attribute

turn into positive checks via script
@choikwa choikwa force-pushed the siinsertwaitcnts-remove-scope branch from 436fb0e to 55b34c2 Compare September 10, 2025 20:49
Copy link
Collaborator

@rampitec rampitec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Sep 10, 2025
@choikwa choikwa merged commit ef7de8d into llvm:main Sep 12, 2025
9 checks passed
@llvm-ci
Copy link
Collaborator

llvm-ci commented Sep 12, 2025

LLVM Buildbot has detected a new failure on builder lldb-aarch64-ubuntu running on linaro-lldb-aarch64-ubuntu while building llvm at step 6 "test".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/59/builds/24122

Here is the relevant piece of the build log for the reference
Step 6 (test) failure: build (failure)
...
PASS: lldb-api :: commands/memory/write/TestMemoryWrite.py (194 of 2320)
PASS: lldb-api :: commands/platform/file/read/TestPlatformFileRead.py (195 of 2320)
PASS: lldb-api :: commands/platform/file/close/TestPlatformFileClose.py (196 of 2320)
PASS: lldb-api :: commands/memory/read/TestMemoryRead.py (197 of 2320)
UNSUPPORTED: lldb-api :: commands/platform/sdk/TestPlatformSDK.py (198 of 2320)
PASS: lldb-api :: commands/platform/connect/TestPlatformConnect.py (199 of 2320)
PASS: lldb-api :: commands/plugin/TestPlugin.py (200 of 2320)
PASS: lldb-api :: commands/platform/process/launch/TestPlatformProcessLaunch.py (201 of 2320)
PASS: lldb-api :: commands/platform/process/list/TestProcessList.py (202 of 2320)
UNRESOLVED: lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py (203 of 2320)
******************** TEST 'lldb-api :: commands/gui/spawn-threads/TestGuiSpawnThreads.py' FAILED ********************
Script:
--
/usr/bin/python3.10 /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/dotest.py -u CXXFLAGS -u CFLAGS --env LLVM_LIBS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --env LLVM_INCLUDE_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/include --env LLVM_TOOLS_DIR=/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --arch aarch64 --build-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex --lldb-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api --clang-module-cache-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-clang/lldb-api --executable /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/lldb --compiler /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/clang --dsymutil /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin/dsymutil --make /usr/bin/gmake --llvm-tools-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./bin --lldb-obj-root /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/tools/lldb --lldb-libs-dir /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/./lib --cmake-build-type Release /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads -p TestGuiSpawnThreads.py
--
Exit Code: 1

Command Output (stdout):
--
lldb version 22.0.0git (https://github.com/llvm/llvm-project.git revision ef7de8d1447c822dec72d685d85053216936b895)
  clang revision ef7de8d1447c822dec72d685d85053216936b895
  llvm revision ef7de8d1447c822dec72d685d85053216936b895
Skipping the following test categories: ['libc++', 'msvcstl', 'dsym', 'gmodules', 'debugserver', 'objc']

--
Command Output (stderr):
--
FAIL: LLDB (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/clang-aarch64) :: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
======================================================================
ERROR: test_gui (TestGuiSpawnThreads.TestGuiSpawnThreadsTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/packages/Python/lldbsuite/test/decorators.py", line 155, in wrapper
    return func(*args, **kwargs)
  File "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/llvm-project/lldb/test/API/commands/gui/spawn-threads/TestGuiSpawnThreads.py", line 44, in test_gui
    self.child.expect_exact(f"thread #{i + 2}: tid =")
  File "/usr/local/lib/python3.10/dist-packages/pexpect/spawnbase.py", line 432, in expect_exact
    return exp.expect_loop(timeout)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 179, in expect_loop
    return self.eof(e)
  File "/usr/local/lib/python3.10/dist-packages/pexpect/expect.py", line 122, in eof
    raise exc
pexpect.exceptions.EOF: End Of File (EOF). Exception style platform.
<pexpect.pty_spawn.spawn object at 0xf32c2ea81ab0>
command: /home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb
args: ['/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb', '--no-lldbinit', '--no-use-colors', '-O', 'settings clear --all', '-O', 'settings set symbols.enable-external-lookup false', '-O', 'settings set target.inherit-tcc true', '-O', 'settings set target.disable-aslr false', '-O', 'settings set target.detach-on-error false', '-O', 'settings set target.auto-apply-fixits false', '-O', 'settings set plugin.process.gdb-remote.packet-timeout 60', '-O', 'settings set symbols.clang-modules-cache-path "/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/module-cache-lldb/lldb-api"', '-O', 'settings set use-color false', '-O', 'settings set show-statusline false', '--file', '/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/lldb-test-build.noindex/commands/gui/spawn-threads/TestGuiSpawnThreads.test_gui/a.out']
buffer (last 100 chars): b''
before (last 100 chars): b'9 0x0000aaeab21a4b30 _start (/home/tcwg-buildbot/worker/lldb-aarch64-ubuntu/build/bin/lldb+0x44b30)\n'
after: <class 'pexpect.exceptions.EOF'>

searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 25, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 25, 2025
…efore (llvm#157821)

This change was motivated by CK where many VMCNT(0)'s were generated due
to instructions lacking !alias.scope metadata. The two causes of this
were:
1) LowerLDSModule not tacking on scope metadata on a single LDS variable
2) IPSCCP pass before inliner replacing noalias ptr derivative with a
global value, which made inliner unable to track it back to the noalias
   ptr argument.

However, it turns out that IPSCCP losing the scope information was
largely ineffectual as ScopedNoAliasAA was able to handle asymmetric
condition, where one MemLoc was missing scope, and still return NoAlias
result.

AMDGPU however was checking for existence of scope in SIInsertWaitcnts
and conservatively treating it as aliasing all and inserted VMCNT(0)
before DS_READs, forcing it to wait for all previous LDS DMA
instructions.

Since we know that ScopedNoAliasAA can handle asymmetry, we should also
allow AA query to determine if two MIs may alias.

Passed PSDB.

Previous attempt to address the issue in IPSCCP, likely stalled:
llvm#154522
This solution may be preferrable over that as issue only affects AMDGPU.

Cherry-picked from 8ae3aea and ef7de8d
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Sep 25, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Oct 9, 2025
searlmc1 pushed a commit to ROCm/llvm-project that referenced this pull request Oct 9, 2025
…efore (llvm#157821)

This change was motivated by CK where many VMCNT(0)'s were generated due
to instructions lacking !alias.scope metadata. The two causes of this
were:
1) LowerLDSModule not tacking on scope metadata on a single LDS variable
2) IPSCCP pass before inliner replacing noalias ptr derivative with a
global value, which made inliner unable to track it back to the noalias
   ptr argument.

However, it turns out that IPSCCP losing the scope information was
largely ineffectual as ScopedNoAliasAA was able to handle asymmetric
condition, where one MemLoc was missing scope, and still return NoAlias
result.

AMDGPU however was checking for existence of scope in SIInsertWaitcnts
and conservatively treating it as aliasing all and inserted VMCNT(0)
before DS_READs, forcing it to wait for all previous LDS DMA
instructions.

Since we know that ScopedNoAliasAA can handle asymmetry, we should also
allow AA query to determine if two MIs may alias.

Passed PSDB.

Previous attempt to address the issue in IPSCCP, likely stalled:
llvm#154522
This solution may be preferrable over that as issue only affects AMDGPU.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants