-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AMDGPU] Marking super-reg as implicit-def in first spill instruction #114773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,72 @@ | ||
| # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py | ||
| # RUN: llc -mtriple=amdgcn -mcpu=gfx908 %s -o - -run-pass prologepilog,machine-cp -verify-machineinstrs | FileCheck -check-prefix=GFX908 %s | ||
|
|
||
| # When VGPRs are available for spilling, prologepilog marks the tuple implicit-def as well as implicit in the first spill instruction. | ||
| # As a consequence, machine-cp would NOT delete agpr2 copy here. | ||
|
|
||
| --- | ||
| name: agpr-spill-to-vgpr-machine-cp | ||
| tracksRegLiveness: true | ||
| stack: | ||
| - { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 } | ||
| machineFunctionInfo: | ||
| scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3 | ||
| stackPtrOffsetReg: '$sgpr32' | ||
| hasSpilledVGPRs: true | ||
| body: | | ||
| bb.0: | ||
| successors: | ||
| liveins: $vgpr0, $vgpr1 | ||
|
|
||
| ; GFX908-LABEL: name: agpr-spill-to-vgpr-machine-cp | ||
| ; GFX908: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33 | ||
| ; GFX908-NEXT: {{ $}} | ||
| ; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||
| ; GFX908-NEXT: renamable $agpr2 = COPY renamable $vgpr1, implicit $exec | ||
| ; GFX908-NEXT: $vgpr33 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2 | ||
| ; GFX908-NEXT: $vgpr32 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec | ||
| ; GFX908-NEXT: $vgpr31 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2 | ||
| ; GFX908-NEXT: S_ENDPGM 0 | ||
| renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||
| renamable $agpr2 = COPY renamable $vgpr1, implicit $exec | ||
| SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5) | ||
| S_ENDPGM 0 | ||
| ... | ||
|
|
||
| # When VGPRs are NOT available for spilling (stack is used), prologepilog marks the tuple implicit-def only and NOT implicit. | ||
| # As a consequence, machine-cp would delete agpr2 copy here. | ||
|
|
||
| --- | ||
| name: agpr-spill-to-vgpr-to-stack-machine-cp | ||
| tracksRegLiveness: true | ||
| stack: | ||
| - { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 } | ||
| machineFunctionInfo: | ||
| scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3 | ||
| stackPtrOffsetReg: '$sgpr32' | ||
| hasSpilledVGPRs: true | ||
| body: | | ||
| bb.0: | ||
| successors: | ||
| liveins: $vgpr0, $vgpr1 | ||
| ; GFX908-LABEL: name: agpr-spill-to-vgpr-to-stack-machine-cp | ||
| ; GFX908: liveins: $vgpr0, $vgpr1, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33, $vgpr34, $vgpr35, $vgpr36, $vgpr37, $vgpr38, $vgpr39, $vgpr48, $vgpr49, $vgpr50, $vgpr51, $vgpr52, $vgpr53, $vgpr54, $vgpr55 | ||
| ; GFX908-NEXT: {{ $}} | ||
| ; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||
| ; GFX908-NEXT: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF | ||
| ; GFX908-NEXT: $vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF | ||
| ; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2 | ||
| ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.0, addrspace 5) | ||
| ; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec | ||
| ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5) | ||
| ; GFX908-NEXT: $vgpr55 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2 | ||
| ; GFX908-NEXT: S_ENDPGM 0 | ||
| renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||
| renamable $agpr2 = COPY renamable $vgpr1, implicit $exec | ||
|
|
||
| $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF | ||
| $vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF | ||
|
|
||
| SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5) | ||
| S_ENDPGM 0 | ||
| ... | ||
| Original file line number | Diff line number | Diff line change | ||||
|---|---|---|---|---|---|---|
| @@ -0,0 +1,73 @@ | ||||||
| # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py | ||||||
| # RUN: llc -mtriple=amdgcn -mcpu=gfx908 %s -o - -run-pass prologepilog -verify-machineinstrs | FileCheck -check-prefix=GFX908 %s | ||||||
|
|
||||||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add a comment here with your observation. Preferably, one before each test. |
||||||
| # During spill expansion, when VGPRs are available for spilling (stack is unused), tuple is being marked as | ||||||
| # implicit-def as well as implicit in the first spill instrunction. | ||||||
|
|
||||||
| --- | ||||||
| name: agpr-spill-to-vgpr | ||||||
| tracksRegLiveness: true | ||||||
| stack: | ||||||
| - { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 } | ||||||
| machineFunctionInfo: | ||||||
| scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3 | ||||||
| stackPtrOffsetReg: '$sgpr32' | ||||||
| hasSpilledVGPRs: true | ||||||
| body: | | ||||||
| bb.0: | ||||||
| successors: | ||||||
| liveins: $vgpr0, $vgpr1 | ||||||
|
|
||||||
| ; GFX908-LABEL: name: agpr-spill-to-vgpr | ||||||
| ; GFX908: liveins: $vgpr0, $vgpr1, $vgpr2, $vgpr3, $vgpr4, $vgpr5, $vgpr6, $vgpr7, $vgpr8, $vgpr9, $vgpr10, $vgpr11, $vgpr12, $vgpr13, $vgpr14, $vgpr15, $vgpr16, $vgpr17, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33 | ||||||
| ; GFX908-NEXT: {{ $}} | ||||||
| ; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||||||
| ; GFX908-NEXT: renamable $agpr2 = COPY renamable $vgpr1, implicit $exec | ||||||
| ; GFX908-NEXT: $vgpr33 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2 | ||||||
| ; GFX908-NEXT: $vgpr32 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec | ||||||
| ; GFX908-NEXT: $vgpr31 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2 | ||||||
| ; GFX908-NEXT: S_ENDPGM 0 | ||||||
| renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||||||
| renamable $agpr2 = COPY renamable $vgpr1, implicit $exec | ||||||
| SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5) | ||||||
| S_ENDPGM 0 | ||||||
| ... | ||||||
|
|
||||||
| # During spill expansion, when VGPRs are NOT available for spilling (stack is used), tuple is being marked as | ||||||
| # implicit-def ONLY and NOT implicit in the first spill instrunction. | ||||||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It does need the implicit use. The expanded spill should behave exactly like copyPhysReg does for tuples, which inserts implicit-def+implicit use
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This should have the same logic
as the copy case:
Not sure why the spill one is more complex or why it didn't trigger here
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Pravin posted #115285 to handle it. But I guess we should add the implicit operand for all individual spills for the tuple, not just for the first and the last slices? |
||||||
|
|
||||||
| --- | ||||||
| name: agpr-spill-to-vgpr-to-stack | ||||||
| tracksRegLiveness: true | ||||||
| stack: | ||||||
| - { id: 0, name: '', type: spill-slot, offset: 0, size: 128, alignment: 4 } | ||||||
| machineFunctionInfo: | ||||||
| scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3 | ||||||
| stackPtrOffsetReg: '$sgpr32' | ||||||
| hasSpilledVGPRs: true | ||||||
| body: | | ||||||
| bb.0: | ||||||
| successors: | ||||||
| liveins: $vgpr0, $vgpr1 | ||||||
| ; GFX908-LABEL: name: agpr-spill-to-vgpr-to-stack | ||||||
| ; GFX908: liveins: $vgpr0, $vgpr1, $vgpr18, $vgpr19, $vgpr20, $vgpr21, $vgpr22, $vgpr23, $vgpr24, $vgpr25, $vgpr26, $vgpr27, $vgpr28, $vgpr29, $vgpr30, $vgpr31, $vgpr32, $vgpr33, $vgpr34, $vgpr35, $vgpr36, $vgpr37, $vgpr38, $vgpr39, $vgpr48, $vgpr49, $vgpr50, $vgpr51, $vgpr52, $vgpr53, $vgpr54, $vgpr55 | ||||||
| ; GFX908-NEXT: {{ $}} | ||||||
| ; GFX908-NEXT: renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||||||
| ; GFX908-NEXT: renamable $agpr2 = COPY renamable $vgpr1, implicit $exec | ||||||
| ; GFX908-NEXT: $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF | ||||||
| ; GFX908-NEXT: $vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF | ||||||
| ; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2 | ||||||
| ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 0, 0, 0, implicit $exec, implicit $agpr0_agpr1_agpr2 :: (store (s32) into %stack.0, addrspace 5) | ||||||
| ; GFX908-NEXT: $vgpr40 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec | ||||||
| ; GFX908-NEXT: BUFFER_STORE_DWORD_OFFSET $vgpr40, $sgpr0_sgpr1_sgpr2_sgpr3, $sgpr32, 4, 0, 0, implicit $exec :: (store (s32) into %stack.0 + 4, addrspace 5) | ||||||
| ; GFX908-NEXT: $vgpr55 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2 | ||||||
| ; GFX908-NEXT: S_ENDPGM 0 | ||||||
| renamable $agpr0 = COPY renamable $vgpr0, implicit $exec | ||||||
| renamable $agpr2 = COPY renamable $vgpr1, implicit $exec | ||||||
|
|
||||||
| $vgpr2_vgpr3_vgpr4_vgpr5_vgpr6_vgpr7_vgpr8_vgpr9 = IMPLICIT_DEF | ||||||
| $vgpr10_vgpr11_vgpr12_vgpr13_vgpr14_vgpr15_vgpr16_vgpr17 = IMPLICIT_DEF | ||||||
|
|
||||||
| SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5) | ||||||
| S_ENDPGM 0 | ||||||
| ... | ||||||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| --- | ||
| name: test_implicit_def | ||
| tracksRegLiveness: true | ||
| stack: | ||
| - { id: 0, name: '', type: spill-slot, offset: 0, size: 12, alignment: 4 } | ||
| machineFunctionInfo: | ||
| scratchRSrcReg: $sgpr0_sgpr1_sgpr2_sgpr3 | ||
| stackPtrOffsetReg: '$sgpr32' | ||
| hasSpilledVGPRs: true | ||
| body: | | ||
| bb.0: | ||
| successors: | ||
| liveins: $vgpr0, $vgpr1 | ||
| SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5) | ||
| S_ENDPGM 0 | ||
| ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this representative of the real testcase where you encountered it? I thought we were reserving a VGPR on 908 for this case. We should never need the emergency stack slot spill to handle the copy, it should trivially use the reserved register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concern here is not the VGPR but the AGPR tuple and its liveness representation when we peel off its single-spill pseudo into individual spill stores for its subregs. Yes, we do have the vgprForAGPRCopy field introduced for reserving a VGPR for the same and its serialized version is available too.