-
Notifications
You must be signed in to change notification settings - Fork 15.3k
[AMDGPU] Marking super-reg as implicit-def in first spill instruction #114773
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
During AGPR tuple SPILLing, AMDGPU backend marks the tuple as implicit-def in the first spill instruction. It clobbers the register and the machine-cp thinks its def i.e. copy is deletable which I think is incorrect. Seeking help to understand whether following copy is deletable here ? (given the fact that compiler decided to mark it implicit-def for preserving the liveness) renamable $agpr1 = COPY renamable $agpr3, implicit $exec
That implicit def looks completely wrong to me. If anything, it should be an implicit def of |
|
Marking implicit-def only when sub-reg is undefined using RS while expanding SI_SPILL_AV1024_SAVE (in |
|
I can guess it was done to change the def of $agpr0_agpr1 as partially defined now. But there was probably code to kill $argp0 upon partial spill, which is now gone. As shown it is irrelevant. |
|
The right fix, in this case, would be to mark only the subregs whose defs are missing (more precisely, mark the implicit-def not in the first instance, but for the final spill instruction inserted for that subreg). |
|
MachineCopyPropagation is not doing anything wrong; the implicit-def is wrong. So this issue needs a proper test case that starts before the implicit-def is added, so we can see why it was added and what we can do instead. |
as implicit-def while expanding of SI_SPILL_AV96_SAVE.
implicit-def during spill expansion on machine-cp
18f274c to
7d02a1b
Compare
|
If we look at test agpr-spill-to-vgpr, while expanding the tuple, we mark the tuple As a consequence, we can see that machine cp will delete the copy in second case and not in the first case. Please take a loot at av-spill-expansion-with-machine-cp.mir |
| @@ -0,0 +1,67 @@ | |||
| # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py | |||
| # RUN: llc -mtriple=amdgcn -mcpu=gfx908 %s -o - -run-pass prologepilog -verify-machineinstrs | FileCheck -check-prefix=GFX908 %s | |||
|
|
|||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment here with your observation. Preferably, one before each test.
| S_ENDPGM 0 | ||
| ... | ||
|
|
||
| # When VGPRs are NOT available for spilling (stack is used), prologepilog marks the tuple implicit-def only and NOT implicit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this representative of the real testcase where you encountered it? I thought we were reserving a VGPR on 908 for this case. We should never need the emergency stack slot spill to handle the copy, it should trivially use the reserved register
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this representative of the real testcase where you encountered it? I thought we were reserving a VGPR on 908 for this case. We should never need the emergency stack slot spill to handle the copy, it should trivially use the reserved register
The concern here is not the VGPR but the AGPR tuple and its liveness representation when we peel off its single-spill pseudo into individual spill stores for its subregs. Yes, we do have the vgprForAGPRCopy field introduced for reserving a VGPR for the same and its serialized version is available too.
| ... | ||
|
|
||
| # During spill expansion, when VGPRs are NOT available for spilling (stack is used), tuple is being marked as | ||
| # implicit-def ONLY and NOT implicit in the first spill instrunction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does need the implicit use. The expanded spill should behave exactly like copyPhysReg does for tuples, which inserts implicit-def+implicit use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should have the same logic
| MIB.addReg(ValueReg, RegState::Implicit | State); |
as the copy case:
| Builder.addReg(SrcReg, getKillRegState(UseKill) | RegState::Implicit); |
Not sure why the spill one is more complex or why it didn't trigger here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pravin posted #115285 to handle it. But I guess we should add the implicit operand for all individual spills for the tuple, not just for the first and the last slices?
Just want to point this again what jayfoad said, there seems one major difference besides the way implicit is used(in spill tuple(first & last spill only) expansion and copyPhysReg(all subregs) expansion), i.e. in usage of implicit-def. The spill version uses implicit-def with SrcReg, whereas, copyPhysReg makes use of it for its DestReg only. In both scenarios, SrcReg is the one implicitly defined. Can anyone comment on this! |
During AGPR tuple SPILLing, AMDGPU backend marks
the tuple (super-reg) as implicit-def in the first spill
instruction. This preserves the liveness of super-reg and
satisfies the machine verifier.
Marking the super-regs as implicit-def can clobbers the
sub-registers (as a consequence) and can confuse the
machine-cp.
Seeking help to understand options to mitigate the effect
of implicit-def.
Above spill will be expanded to