Skip to content

Commit 6a8c205

Browse files
authored
[AMDGPU] Avoid bundling a SCHED_BARRIER with memops (llvm#153533)
Avoid bundling a SCHED_BARRIER with memops. Memops are still can be bundled and a SCHED_BARRIER ends the bundle. (e.g. [load, load, ... SCHED_BARRIER(exclusive from the bundle) ), This is to honor the SCHED_BARRIERs maximally without intervention of bundling. If a SCHED_BARRIER is placed in a bundle between memops, the SCHED_BARRIER is not used during IGroupLPMutation in postra mi-sched phase. In addition, bundling memory ops with in-between SCHED_BARRIER can prevent that SCHED_BARRIER or any neighboring SCHED_BARRIER being honored. As users already provided SCHED_BARRIER between memory ops, don't bundle it together with memory ops. Bypassing any bundling in a MBB with a SCHED_BARRIER removes all those problems occur.
1 parent 3b963f9 commit 6a8c205

File tree

2 files changed

+60
-3
lines changed

2 files changed

+60
-3
lines changed

llvm/lib/Target/AMDGPU/SIPostRABundler.cpp

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -184,9 +184,11 @@ bool SIPostRABundler::run(MachineFunction &MF) {
184184
if (I->getNumExplicitDefs() != 0)
185185
Defs.insert(I->defs().begin()->getReg());
186186
++ClauseLength;
187-
} else if (!I->isMetaInstruction()) {
188-
// Allow meta instructions in between bundle candidates, but do not
189-
// start or end a bundle on one.
187+
} else if (!I->isMetaInstruction() ||
188+
I->getOpcode() == AMDGPU::SCHED_BARRIER) {
189+
// SCHED_BARRIER is not bundled to be honored by scheduler later.
190+
// Allow other meta instructions in between bundle candidates, but do
191+
// not start or end a bundle on one.
190192
//
191193
// TODO: It may be better to move meta instructions like dbg_value
192194
// after the bundle. We're relying on the memory legalizer to unbundle

llvm/test/CodeGen/AMDGPU/postra-bundle-memops.mir

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -351,3 +351,58 @@ body: |
351351
$vgpr1 = GLOBAL_LOAD_DWORD $vgpr5_vgpr6, 0, 0, implicit $exec
352352
KILL killed $vgpr3_vgpr4, killed $vgpr5_vgpr6
353353
...
354+
355+
# Avoid bundling if a MBB has SCHED_BARRIER
356+
---
357+
name: no_sched_barrier_within_bundle
358+
tracksRegLiveness: true
359+
body: |
360+
bb.0:
361+
; GCN-LABLE: name: no_sched_barrier_within_bundle
362+
; GCN: renamable $sgpr0_sgpr1 = IMPLICIT_DEF
363+
; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
364+
; GCN-NEXT: BUNDLE implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0, implicit $exec {
365+
; GCN-NEXT: renamable $vgpr1 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 0, 0, implicit $exec, implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2
366+
; GCN-NEXT: renamable $vgpr2 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 512, 0, implicit $exec, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0
367+
; GCN-NEXT: }
368+
; GCN-NEXT: renamable $sgpr2_sgpr3 = IMPLICIT_DEF
369+
; GCN-NEXT: renamable $vgpr10 = IMPLICIT_DEF
370+
; GCN-NEXT: renamable $vgpr1 = nsw V_MUL_LO_U32_e64 killed $vgpr1, $vgpr1, implicit $exec
371+
; GCN-NEXT: renamable $vgpr2 = nsw V_MUL_LO_U32_e64 killed $vgpr2, $vgpr2, implicit $exec
372+
; GCN-NEXT: SCHED_BARRIER 1924
373+
; GCN-NEXT: renamable $vgpr11 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr2_sgpr3, renamable $vgpr10, 0, 0, implicit $exec, implicit-def $vgpr11, implicit-def $vgpr11_lo16, implicit-def $vgpr11_hi16, implicit $sgpr2_sgpr3, implicit $vgpr10
374+
; GCN-NEXT: SCHED_BARRIER 1924
375+
; GCN-NEXT: renamable $vgpr12 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr2_sgpr3, renamable $vgpr10, 512, 0, implicit $exec, implicit-def $vgpr12, implicit-def $vgpr12_lo16, implicit-def $vgpr12_hi16, implicit $sgpr2_sgpr3, implicit $vgpr10
376+
; GCN-NEXT: renamable $sgpr4_sgpr5 = IMPLICIT_DEF
377+
; GCN-NEXT: renamable $vgpr0 = IMPLICIT_DEF
378+
; GCN-NEXT: renamable $vgpr11 = nsw V_MUL_LO_U32_e64 killed $vgpr11, $vgpr11, implicit $exec
379+
; GCN-NEXT: renamable $vgpr12 = nsw V_MUL_LO_U32_e64 killed $vgpr12, $vgpr12, implicit $exec
380+
; GCN-NEXT: BUNDLE implicit killed $vgpr10, implicit killed $vgpr11, implicit killed $sgpr2_sgpr3, implicit $exec, implicit killed $vgpr12, implicit killed $vgpr0, implicit killed $vgpr1, implicit killed $sgpr4_sgpr5, implicit killed $vgpr2 {
381+
; GCN-NEXT: GLOBAL_STORE_DWORD_SADDR renamable $vgpr10, killed renamable $vgpr11, renamable $sgpr2_sgpr3, 0, 0, implicit $exec, implicit killed $vgpr11
382+
; GCN-NEXT: GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr10, killed renamable $vgpr12, killed renamable $sgpr2_sgpr3, 512, 0, implicit $exec
383+
; GCN-NEXT: GLOBAL_STORE_DWORD_SADDR renamable $vgpr0, killed renamable $vgpr1, renamable $sgpr4_sgpr5, 0, 0, implicit $exec
384+
; GCN-NEXT: GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr4_sgpr5, 512, 0, implicit $exec
385+
; GCN-NEXT: }
386+
; GCN-NEXT: S_ENDPGM 0
387+
renamable $sgpr0_sgpr1 = IMPLICIT_DEF
388+
renamable $vgpr0 = IMPLICIT_DEF
389+
renamable $vgpr1 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 0, 0, implicit $exec, implicit-def $vgpr1, implicit-def $vgpr1_lo16, implicit-def $vgpr1_hi16, implicit-def $vgpr2
390+
renamable $vgpr2 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr0_sgpr1, renamable $vgpr0, 512, 0, implicit $exec, implicit-def $vgpr2_lo16, implicit-def $vgpr2_hi16, implicit $sgpr0_sgpr1, implicit $vgpr0
391+
renamable $sgpr2_sgpr3 = IMPLICIT_DEF
392+
renamable $vgpr10 = IMPLICIT_DEF
393+
renamable $vgpr1 = nsw V_MUL_LO_U32_e64 killed $vgpr1, $vgpr1, implicit $exec
394+
renamable $vgpr2 = nsw V_MUL_LO_U32_e64 killed $vgpr2, $vgpr2, implicit $exec
395+
SCHED_BARRIER 1924
396+
renamable $vgpr11 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr2_sgpr3, renamable $vgpr10, 0, 0, implicit $exec, implicit-def $vgpr11, implicit-def $vgpr11_lo16, implicit-def $vgpr11_hi16, implicit $sgpr2_sgpr3, implicit $vgpr10
397+
SCHED_BARRIER 1924
398+
renamable $vgpr12 = GLOBAL_LOAD_DWORD_SADDR renamable $sgpr2_sgpr3, renamable $vgpr10, 512, 0, implicit $exec, implicit-def $vgpr12, implicit-def $vgpr12_lo16, implicit-def $vgpr12_hi16, implicit $sgpr2_sgpr3, implicit $vgpr10
399+
renamable $sgpr4_sgpr5 = IMPLICIT_DEF
400+
renamable $vgpr0 = IMPLICIT_DEF
401+
renamable $vgpr11 = nsw V_MUL_LO_U32_e64 killed $vgpr11, $vgpr11, implicit $exec
402+
renamable $vgpr12 = nsw V_MUL_LO_U32_e64 killed $vgpr12, $vgpr12, implicit $exec
403+
GLOBAL_STORE_DWORD_SADDR renamable $vgpr10, killed renamable $vgpr11, renamable $sgpr2_sgpr3, 0, 0, implicit $exec, implicit killed $vgpr11
404+
GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr10, killed renamable $vgpr12, killed renamable $sgpr2_sgpr3, 512, 0, implicit $exec
405+
GLOBAL_STORE_DWORD_SADDR renamable $vgpr0, killed renamable $vgpr1, renamable $sgpr4_sgpr5, 0, 0, implicit $exec
406+
GLOBAL_STORE_DWORD_SADDR killed renamable $vgpr0, killed renamable $vgpr2, killed renamable $sgpr4_sgpr5, 512, 0, implicit $exec
407+
S_ENDPGM 0
408+
...

0 commit comments

Comments
 (0)