Skip to content

Conversation

@elad335
Copy link
Contributor

@elad335 elad335 commented Nov 22, 2025

In Hot Shots Golf, one SPU atomic operation stands out more than the rest:
image

This operation, manages acquireship of cellSpurs Jobchain tasks.
After some observation, I've noticed that this atomic operation may be compatible with the PUTLLC16 optimization, which you can learn about here: #15429

But why it wasn't yet compatible for it? for few reasons it turns out.

  1. There is a function call, BRSL, in the middle of the operation. But after inspecting what it does, it does not seem to make sense for the function. But after second inspection, I noticed that register 4 (R4) was used right after the function call. So I figured, that this must be a [[noreturn]] type of function which does not actually return to function. Implementing detection of this eliminates optimization failure number 1 for SPU Mega block size.
image
  1. There are two addresses LQR and STQR refer to, (0x4a20 and 0x4a90), but PUTLLC16 can only refer to one. (which belongs to the atomic operation)
    After inspection, the second address cannot be interleaved with the cache line data of the operation. But, because of the usage of relative loads and stores, a runtime check was added in LLVM code to verify it. (otherwise resorting to the full and heavy PUTLLC)
  2. There is a fast path that skips PUTLLC conditionally, it just so happens to contain a relative store that messes with the detection of PUTLLC16. Making failed paths signal failure only at the reach of the pattern destination fixes this issue. That was s design flaw of PUTLLC16 detection.

Fixes #14724

Note: This pull requests affect both SPU Safe and SPU Mega block sizes.

Before

image

After

image

@elad335 elad335 added CPU Optimization Optimizes existing code LLVM Related to LLVM instruction decoders labels Nov 22, 2025
@woj1993
Copy link

woj1993 commented Nov 22, 2025

Hi do you need any types of tests? Also can you marge master to it as there are also new optimizations so it will be hard to compare speed without them here?

@elad335
Copy link
Contributor Author

elad335 commented Nov 22, 2025

Yes you can test performamce differences.

@RPCS3 RPCS3 deleted a comment from digant73 Nov 22, 2025
@woj1993
Copy link

woj1993 commented Nov 22, 2025

God of war 3 difference is too small so probably difference in gameplay. In Uncharted 1 I saw difference between 76.5 max (master) and 68..5 on branch but it can be because of gameplay or something. All other stats are similar:
God of war 3
Master:
Zrzut ekranu (609)
Branch:
Zrzut ekranu (611)
Uncharrted 1
Master:
Zrzut ekranu (610)
Branch:
Zrzut ekranu (612)

@kd-11
Copy link
Contributor

kd-11 commented Nov 22, 2025

The branch crashes resistance 3. The crash can be reproduced in the "Haven" demo as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CPU LLVM Related to LLVM instruction decoders Optimization Optimizes existing code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Regression] Performance Regressions in Hot Shots Golf: Out of Bounds (#11904, #12523)

3 participants