Skip to content

Commit cb8ce28

Browse files
authored
[AMDGPU][Waitcnts] Don't create a pending flat event for LDS DMA (llvm#170263)
Flat instructions need a waitcnt(0) on both VMEM and LDS accesses, but only when the instruction really is using flat addressing. The LDS DMA instructions (on GFX9) have the FLAT flag set, but they have very clear semantics. These instructions update only VM_CNT (on GFX9), and hence do not need to be treated like actual flat instructions.
1 parent d364c0e commit cb8ce28

File tree

2 files changed

+9
-5
lines changed

2 files changed

+9
-5
lines changed

llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -2289,10 +2289,13 @@ void SIInsertWaitcnts::updateEventWaitcntAfter(MachineInstr &Inst,
22892289
ScoreBrackets->updateByEvent(LDS_ACCESS, Inst);
22902290
}
22912291

2292-
// This is a flat memory operation that access both VMEM and LDS, so note it
2293-
// - it will require that both the VM and LGKM be flushed to zero if it is
2294-
// pending when a VM or LGKM dependency occurs.
2295-
if (FlatASCount > 1)
2292+
// If this is a truly flat memory operation, then it accesss both VMEM and
2293+
// LDS, so note it - it will require that both the VM and LGKM be flushed to
2294+
// zero if it is pending when a VM or LGKM dependency occurs.
2295+
//
2296+
// For example, LDS DMA operations have FLAT set in their TSFlags for
2297+
// unspecified reasons, but they are not flat operations)
2298+
if (!SIInstrInfo::isLDSDMA(Inst) && FlatASCount > 1)
22962299
ScoreBrackets->setPendingFlat();
22972300
} else if (SIInstrInfo::isVMEM(Inst) &&
22982301
!llvm::AMDGPU::getMUBUFIsBufferInv(Inst.getOpcode())) {

llvm/test/CodeGen/AMDGPU/lds-dma-waits.ll

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -107,9 +107,10 @@ define amdgpu_kernel void @global_load_lds_dword_2_arrays(ptr addrspace(1) nocap
107107
; GFX9-NEXT: s_lshl_b32 s1, s3, 2
108108
; GFX9-NEXT: v_mov_b32_e32 v0, s0
109109
; GFX9-NEXT: v_mov_b32_e32 v1, s1
110-
; GFX9-NEXT: s_waitcnt vmcnt(0)
110+
; GFX9-NEXT: s_waitcnt vmcnt(2)
111111
; GFX9-NEXT: ds_read_b32 v0, v0
112112
; GFX9-NEXT: ; wave barrier
113+
; GFX9-NEXT: s_waitcnt vmcnt(0)
113114
; GFX9-NEXT: ds_read_b32 v1, v1 offset:256
114115
; GFX9-NEXT: s_waitcnt lgkmcnt(0)
115116
; GFX9-NEXT: global_store_dwordx2 v2, v[0:1], s[6:7]

0 commit comments

Comments
 (0)