Skip to content

Commit 3fded30

Browse files
choikwadsalinas_amdeng
authored andcommitted
[AMDGPU] Remove scope check in SIInsertWaitcnts::generateWaitcntInstBefore (llvm#157821)
This change was motivated by CK where many VMCNT(0)'s were generated due to instructions lacking !alias.scope metadata. The two causes of this were: 1) LowerLDSModule not tacking on scope metadata on a single LDS variable 2) IPSCCP pass before inliner replacing noalias ptr derivative with a global value, which made inliner unable to track it back to the noalias ptr argument. However, it turns out that IPSCCP losing the scope information was largely ineffectual as ScopedNoAliasAA was able to handle asymmetric condition, where one MemLoc was missing scope, and still return NoAlias result. AMDGPU however was checking for existence of scope in SIInsertWaitcnts and conservatively treating it as aliasing all and inserted VMCNT(0) before DS_READs, forcing it to wait for all previous LDS DMA instructions. Since we know that ScopedNoAliasAA can handle asymmetry, we should also allow AA query to determine if two MIs may alias. Passed PSDB. Previous attempt to address the issue in IPSCCP, likely stalled: llvm#154522 This solution may be preferrable over that as issue only affects AMDGPU.
1 parent 796d821 commit 3fded30

File tree

2 files changed

+1
-8
lines changed

2 files changed

+1
-8
lines changed

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1918,13 +1918,7 @@ bool SIInsertWaitcnts::generateWaitcntInstBefore(MachineInstr &MI,
19181918

19191919
// LOAD_CNT is only relevant to vgpr or LDS.
19201920
unsigned RegNo = FIRST_LDS_VGPR;
1921-
// Only objects with alias scope info were added to LDSDMAScopes array.
1922-
// In the absense of the scope info we will not be able to disambiguate
1923-
// aliasing here. There is no need to try searching for a corresponding
1924-
// store slot. This is conservatively correct because in that case we
1925-
// will produce a wait using the first (general) LDS DMA wait slot which
1926-
// will wait on all of them anyway.
1927-
if (Ptr && Memop->getAAInfo() && Memop->getAAInfo().Scope) {
1921+
if (Ptr && Memop->getAAInfo()) {
19281922
const auto &LDSDMAStores = ScoreBrackets.getLDSDMAStores();
19291923
for (unsigned I = 0, E = LDSDMAStores.size(); I != E; ++I) {
19301924
if (MI.mayAlias(AA, *LDSDMAStores[I], true))

llvm/test/CodeGen/AMDGPU/waitcnt-unscoped.ll

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,6 @@ define amdgpu_kernel void @test_waitcnt(ptr addrspace(1) %global_buffer, ptr add
2626
; CHECK-NEXT: ds_write_b32 v2, v1
2727
; CHECK-NEXT: ds_write_b32 v3, v1
2828
; CHECK-NEXT: ; sched_barrier mask(0x00000000)
29-
; CHECK-NEXT: s_waitcnt vmcnt(0)
3029
; CHECK-NEXT: ds_read_b32 v2, v2
3130
; CHECK-NEXT: s_waitcnt lgkmcnt(0)
3231
; CHECK-NEXT: global_store_dword v0, v2, s[0:1] offset:16

0 commit comments

Comments
 (0)