Skip to content

Conversation

@pravinjagtap
Copy link

@pravinjagtap pravinjagtap commented Nov 4, 2024

During AGPR tuple SPILLing, AMDGPU backend marks
the tuple (super-reg) as implicit-def in the first spill
instruction. This preserves the liveness of super-reg and
satisfies the machine verifier.

Marking the super-regs as implicit-def can clobbers the
sub-registers (as a consequence) and can confuse the
machine-cp.

Seeking help to understand options to mitigate the effect
of implicit-def.

    renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
    renamable $agpr2 = COPY renamable $vgpr1, implicit $exec
    SI_SPILL_AV96_SAVE $agpr0_agpr1_agpr2, %stack.0, $sgpr32, 0, implicit $exec :: (store (s96) into %stack.0, align 4, addrspace 5)
    S_ENDPGM 0

Above spill will be expanded to

    renamable $agpr0 = COPY renamable $vgpr0, implicit $exec
    renamable $agpr2 = COPY renamable $vgpr1, implicit $exec
    $vgpr4 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1_agpr2, implicit $agpr0_agpr1_agpr2
    $vgpr3 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec
    $vgpr2 = V_ACCVGPR_READ_B32_e64 $agpr2, implicit $exec, implicit $agpr0_agpr1_agpr2
    S_ENDPGM 0

During AGPR tuple SPILLing, AMDGPU backend marks
the tuple as implicit-def in the first spill
instruction. It clobbers the register and the
machine-cp thinks its def i.e. copy is deletable
which I think is incorrect. Seeking help
to understand whether following copy is
deletable here ? (given the fact that compiler
decided to mark it implicit-def for preserving the
liveness)

renamable $agpr1 = COPY renamable $agpr3, implicit $exec
@jayfoad
Copy link
Contributor

jayfoad commented Nov 4, 2024

$vgpr254 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1

That implicit def looks completely wrong to me. If anything, it should be an implicit def of $vgpr254_vgpr255, i.e. the entire tuple of which this instruction is only explicitly deffing one part.

@pravinjagtap
Copy link
Author

Marking implicit-def only when sub-reg is undefined using RS while expanding SI_SPILL_AV1024_SAVE (in SIRegisterInfo::buildSpillLoadStore) is not helping since $agpr1 is still undefined when you scavenge reg from backwards (agpr1
is not live-outs and $vgpr255 = V_ACCVGPR_READ_B32_e64 $agpr1, implicit $exec is not emitted yet).

@rampitec
Copy link
Collaborator

rampitec commented Nov 4, 2024

I can guess it was done to change the def of $agpr0_agpr1 as partially defined now. But there was probably code to kill $argp0 upon partial spill, which is now gone. As shown it is irrelevant.

@cdevadas
Copy link
Collaborator

cdevadas commented Nov 5, 2024

The right fix, in this case, would be to mark only the subregs whose defs are missing (more precisely, mark the implicit-def not in the first instance, but for the final spill instruction inserted for that subreg).
That requires a liveness query for each subregs at the time of inserting the spills for the tuple. The RS at that moment is useless to query and get the correct information because FI elimination currently happens in the backward direction by default (including AMDGPU) and the RegScavenger also marks the reg states backward. They are yet to visit the actual defs of these instructions.
Is there any other way possible to query if a real def of these subregs exist before this spill point?

@jayfoad
Copy link
Contributor

jayfoad commented Nov 5, 2024

MachineCopyPropagation is not doing anything wrong; the implicit-def is wrong. So this issue needs a proper test case that starts before the implicit-def is added, so we can see why it was added and what we can do instead.

as implicit-def while expanding of SI_SPILL_AV96_SAVE.
@pravinjagtap pravinjagtap changed the title [AMDGPU] Machine-CP is deleting incorrect copy instr. [AMDGPU] Marking super-reg as implicit-def in first spill instruction Nov 5, 2024
implicit-def during spill expansion on machine-cp
@pravinjagtap pravinjagtap force-pushed the prjagtap/incorrect-cpy-del branch from 18f274c to 7d02a1b Compare November 6, 2024 09:38
@pravinjagtap
Copy link
Author

If we look at test agpr-spill-to-vgpr, while expanding the tuple, we mark the tuple implicit-def and implicit (both) in the first spill instruction, but, we only mark it implicit-def (and not implicit) when we are spilling to stack through vgpr (please look at test agpr-spill-to-vgpr-to-stack i.e next test point).
Is that intentional or we missed marking it implicit for the the second case i.e agpr-spill-to-vgpr-to-stack ?

As a consequence, we can see that machine cp will delete the copy in second case and not in the first case. Please take a loot at av-spill-expansion-with-machine-cp.mir

@@ -0,0 +1,67 @@
# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py
# RUN: llc -mtriple=amdgcn -mcpu=gfx908 %s -o - -run-pass prologepilog -verify-machineinstrs | FileCheck -check-prefix=GFX908 %s

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment here with your observation. Preferably, one before each test.

S_ENDPGM 0
...

# When VGPRs are NOT available for spilling (stack is used), prologepilog marks the tuple implicit-def only and NOT implicit.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this representative of the real testcase where you encountered it? I thought we were reserving a VGPR on 908 for this case. We should never need the emergency stack slot spill to handle the copy, it should trivially use the reserved register

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this representative of the real testcase where you encountered it? I thought we were reserving a VGPR on 908 for this case. We should never need the emergency stack slot spill to handle the copy, it should trivially use the reserved register

The concern here is not the VGPR but the AGPR tuple and its liveness representation when we peel off its single-spill pseudo into individual spill stores for its subregs. Yes, we do have the vgprForAGPRCopy field introduced for reserving a VGPR for the same and its serialized version is available too.

...

# During spill expansion, when VGPRs are NOT available for spilling (stack is used), tuple is being marked as
# implicit-def ONLY and NOT implicit in the first spill instrunction.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does need the implicit use. The expanded spill should behave exactly like copyPhysReg does for tuples, which inserts implicit-def+implicit use

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have the same logic

MIB.addReg(ValueReg, RegState::Implicit | State);

as the copy case:

Builder.addReg(SrcReg, getKillRegState(UseKill) | RegState::Implicit);

Not sure why the spill one is more complex or why it didn't trigger here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pravin posted #115285 to handle it. But I guess we should add the implicit operand for all individual spills for the tuple, not just for the first and the last slices?

@vg0204
Copy link
Contributor

vg0204 commented Dec 19, 2024

$vgpr254 = V_ACCVGPR_READ_B32_e64 $agpr0, implicit $exec, implicit-def $agpr0_agpr1

That implicit def looks completely wrong to me. If anything, it should be an implicit def of $vgpr254_vgpr255, i.e. the entire tuple of which this instruction is only explicitly deffing one part.

Just want to point this again what jayfoad said, there seems one major difference besides the way implicit is used(in spill tuple(first & last spill only) expansion and copyPhysReg(all subregs) expansion), i.e. in usage of implicit-def. The spill version uses implicit-def with SrcReg, whereas, copyPhysReg makes use of it for its DestReg only. In both scenarios, SrcReg is the one implicitly defined. Can anyone comment on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants