Skip to content

Conversation

@JosJuice
Copy link
Member

If the last write to a register comes before the last read of it, we can write the register to m_ppc_state after the last write instead of after the last read. This will hopefully help spread out m_ppc_state writes across a code block, improving pipelining. Also, if there's a conditional branch that's after the last write but before the last read, instead of needing to emit one m_ppc_state write on each side of the branch, we now only need to emit one m_ppc_state write.

A note about the changes made to stmw and mfcr: These instructions don't write to any GPRs or CRs respectively – they only read from them. With this commit, there are no longer any cases where registers get written back to m_ppc_state after an instruction that just reads from them, so we can get rid of all STP logic from these two instructions. lmw still needs its STP logic, since that one does write to registers.

I've split this out from the JitArm64-centric PR #14189 because this commit is also useful for Jit64.

gpr.Flush(~(js.op->gprWillBeRead | js.op->gprWillBeWritten) &
(js.op->regsIn | js.op->regsOut),
RegCache::FlushMode::Full);
gpr.Flush(~js.op->gprWillBeWritten & js.op->regsOut, RegCache::FlushMode::Undirty);
Copy link
Contributor

@SuperSamus SuperSamus Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I meant on Discord is the following:

  if (CheckMergedBranch(0))
  {
    // If an operand to the cmp/rc op we're merging with the branch isn't used anymore, it'd be
    // better to flush it here so that we don't have to flush it on both sides of the branch.
    gpr.Flush(~js.op->gprWillBeWritten & (js.op->regsIn | js.op->regsOut), RegCache::FlushMode::Undirty);
    if (needs_test)
      TEST(32, arg, arg);

    arg.Unlock();

    // Unset from the register cache here. By emitting above, no instructions is emitted between
    // the potential TEST and the conditional branch, allowing macro-op fusion.
    gpr.Flush(~(js.op->gprWillBeRead | js.op->gprWillBeWritten) &
                  (js.op->regsIn | js.op->regsOut),
              RegCache::FlushMode::Full);

    DoMergedBranchCondition();
  }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. But do you agree that my commit makes doing that unnecessary?

Copy link
Contributor

@SuperSamus SuperSamus Jan 14, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see it.
For instance:

nand. r1, r2, r3

Assuming that none of the registers are used again: r2 and r3 would have been flushed (with Undirty) already, but r1 wouldn't.
(But mea culpa for the cmpXX comment, which I edited out.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, for instructions that both store to a regular register and to a CR, your change is useful even if this PR is merged.

I think your change would make sense as a separate PR.

@SuperSamus
Copy link
Contributor

Aside from that, it looks fine.
However, I have no familiarity with ARM64's JIT, so I don't understand why some choices were made (like the removal of pair-flushing in stmw).

@JosJuice
Copy link
Member Author

AArch64 has an instruction called STP that stores two registers at once to adjacent memory locations. The JitArm64 register cache makes use of this if you tell it to flush two adjacent dirty registers at once. But with the new PPCAnalyst logic, it's impossible for there to be any dirty registers at an stmw instruction unless the register is going to be written to later, so optimizing the flushing instructions in stmw doesn't matter.

The last write to the register can't be the stmw instruction itself, since stmw doesn't write to any registers. And if the last write to the register had been before the stmw instruction, it would have gotten flushed before the stmw instruction. And if the last write to the register is after the stmw instruction, then during the stmw instruction, it's not considered a register we should flush.

If the last write to a register comes before the last read of it,
we can write the register to ppcState after the last write instead of
after the last read. This will hopefully help spread out m_ppc_state
writes across a code block, improving pipelining. Also, if there's a
conditional branch that's after the last write but before the last read,
instead of needing to emit one m_ppc_state write on each side of the
branch, we now only need to emit one m_ppc_state write.

A note about the changes made to stmw and mfcr: These instructions don't
write to any GPRs or CRs respectively – they only read from them.
With this commit, there are no longer any cases where registers get
written back to m_ppc_state after an instruction that just reads from
them, so we can get rid of all STP logic from these two instructions.
lmw still needs its STP logic, since that one does write to registers.
@JosJuice JosJuice force-pushed the ppcanalyst-split-read-write branch from 2e7875c to 40326ee Compare February 5, 2026 21:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants