Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions llvm/test/CodeGen/AMDGPU/av-spill-expansion-with-machine-cp.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 %s -o - -run-pass prologepilog,machine-cp -verify-machineinstrs | FileCheck -check-prefix=GFX908 %s

# When VGPRs are available for spilling, prologepilog marks the tuple implicit-def as well as implicit in the first spill instruction.
# As a consequence, machine-cp would NOT delete agpr2 copy here.

---
name: agpr-spill-to-vgpr-machine-cp
tracksRegLiveness: true
stack:
- { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 }
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
stackPtrOffsetReg: '$sgpr32'
hasSpilledVGPRs: true
body: |
bb.0:
successors:
liveins: $vgpr0, $vgpr1

; GFX908-LABEL: name: agpr-spill-to-vgpr-machine-cp
; GFX908: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33
; GFX908-NEXT: {{ $}}
; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
; GFX908-NEXT: renamable $agpr2 = COPY renamable $vgpr1, implicit $exec
; GFX908-NEXT: $vgpr33 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: $vgpr32 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-NEXT: $vgpr31 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: S_ENDPGM 0
renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
renamable $agpr2 = COPY renamable $vgpr1, implicit $exec
SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0
...

# When VGPRs are NOT available for spilling (stack is used), prologepilog marks the tuple implicit-def only and NOT implicit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this representative of the real testcase where you encountered it? I thought we were reserving a VGPR on 908 for this case. We should never need the emergency stack slot spill to handle the copy, it should trivially use the reserved register

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this representative of the real testcase where you encountered it? I thought we were reserving a VGPR on 908 for this case. We should never need the emergency stack slot spill to handle the copy, it should trivially use the reserved register

The concern here is not the VGPR but the AGPR tuple and its liveness representation when we peel off its single-spill pseudo into individual spill stores for its subregs. Yes, we do have the vgprForAGPRCopy field introduced for reserving a VGPR for the same and its serialized version is available too.

# As a consequence, machine-cp would delete agpr2 copy here.

---
name: agpr-spill-to-vgpr-to-stack-machine-cp
tracksRegLiveness: true
stack:
- { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 }
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
stackPtrOffsetReg: '$sgpr32'
hasSpilledVGPRs: true
body: |
bb.0:
successors:
liveins: $vgpr0, $vgpr1
; GFX908-LABEL: name: agpr-spill-to-vgpr-to-stack-machine-cp
; GFX908: liveins: $vgpr0, $vgpr1, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33, $vgpr34, $vgpr35, $vgpr36, $vgpr37, $vgpr38, $vgpr39, $vgpr48, $vgpr49, $vgpr50, $vgpr51, $vgpr52, $vgpr53, $vgpr54, $vgpr55
; GFX908-NEXT: {{ $}}
; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
; GFX908-NEXT: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF
; GFX908-NEXT: $vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF
; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.0, addrspace 5)
; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5)
; GFX908-NEXT: $vgpr55 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: S_ENDPGM 0
renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
renamable $agpr2 = COPY renamable $vgpr1, implicit $exec

$vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF
$vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF

SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0
...
73 changes: 73 additions & 0 deletions llvm/test/CodeGen/AMDGPU/av-spill-to-vgpr-and-stack.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 %s -o - -run-pass prologepilog -verify-machineinstrs | FileCheck -check-prefix=GFX908 %s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here with your observation. Preferably, one before each test.

# During spill expansion, when VGPRs are available for spilling (stack is unused), tuple is being marked as
# implicit-def as well as implicit in the first spill instrunction.

---
name: agpr-spill-to-vgpr
tracksRegLiveness: true
stack:
- { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 }
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
stackPtrOffsetReg: '$sgpr32'
hasSpilledVGPRs: true
body: |
bb.0:
successors:
liveins: $vgpr0, $vgpr1

; GFX908-LABEL: name: agpr-spill-to-vgpr
; GFX908: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33
; GFX908-NEXT: {{ $}}
; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
; GFX908-NEXT: renamable $agpr2 = COPY renamable $vgpr1, implicit $exec
; GFX908-NEXT: $vgpr33 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: $vgpr32 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-NEXT: $vgpr31 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: S_ENDPGM 0
renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
renamable $agpr2 = COPY renamable $vgpr1, implicit $exec
SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0
...

# During spill expansion, when VGPRs are NOT available for spilling (stack is used), tuple is being marked as
# implicit-def ONLY and NOT implicit in the first spill instrunction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does need the implicit use. The expanded spill should behave exactly like copyPhysReg does for tuples, which inserts implicit-def+implicit use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have the same logic

MIB.addReg(ValueReg, RegState::Implicit | State);

as the copy case:

Builder.addReg(SrcReg, getKillRegState(UseKill) | RegState::Implicit);

Not sure why the spill one is more complex or why it didn't trigger here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pravin posted #115285 to handle it. But I guess we should add the implicit operand for all individual spills for the tuple, not just for the first and the last slices?


---
name: agpr-spill-to-vgpr-to-stack
tracksRegLiveness: true
stack:
- { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 }
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
stackPtrOffsetReg: '$sgpr32'
hasSpilledVGPRs: true
body: |
bb.0:
successors:
liveins: $vgpr0, $vgpr1
; GFX908-LABEL: name: agpr-spill-to-vgpr-to-stack
; GFX908: liveins: $vgpr0, $vgpr1, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33, $vgpr34, $vgpr35, $vgpr36, $vgpr37, $vgpr38, $vgpr39, $vgpr48, $vgpr49, $vgpr50, $vgpr51, $vgpr52, $vgpr53, $vgpr54, $vgpr55
; GFX908-NEXT: {{ $}}
; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
; GFX908-NEXT: renamable $agpr2 = COPY renamable $vgpr1, implicit $exec
; GFX908-NEXT: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF
; GFX908-NEXT: $vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF
; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.0, addrspace 5)
; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5)
; GFX908-NEXT: $vgpr55 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
; GFX908-NEXT: S_ENDPGM 0
renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
renamable $agpr2 = COPY renamable $vgpr1, implicit $exec

$vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF
$vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF

SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0
...
16 changes: 16 additions & 0 deletions test.mir
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
name: test_implicit_def
tracksRegLiveness: true
stack:
- { id: 0, name: '', type: spill-slot, offset: 0, size: 12, alignment: 4 }
machineFunctionInfo:
scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3
stackPtrOffsetReg: '$sgpr32'
hasSpilledVGPRs: true
body: |
bb.0:
successors:
liveins: $vgpr0, $vgpr1
SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5)
S_ENDPGM 0
...