video_core: Readback optimizations #3404

raphaelthegreat · 2025-08-08T19:47:44Z

Note there is still some cleanup to do, this is not final code. It can also cause freezes/bugs (hope not though)

General idea

If you were to write a vulkan program that generates some data on the GPU and later want to access that data on the host, you need a sync operation to ensure the GPU has finished its work. Said sync operation is called a fence because it makes the CPU wait for the GPU.

In a similar fashion the guest also has to use fences before reading GPU data on the host. Because of its unified memory, it doesn't have to copy said data to a host visible memory, but it still must sync with a fence operation before accessing it. The emulator can rely on that promise, that the guest will not overwrite nor read any GPU generated data before a fence operation has given it opportunity to sync with the GPU.

So the main idea of the PR is to attempt to detect these fence operations in the PM4 command stream and defer read-protecting GPU modified pages until right before them. If a page read/write then happens before a fence, it will pass through without a flush, because the emulator can be sure the guest cannot access the data yet.

The aforementioned detection is not trivial though, because there is little indication to the emulator about what sync operations are used for. AMD GCN uses labels, 4 or 8 byte memory addresses, where "signal" packets write to and "wait" packets can wait on, or the host can poll in case of fence. All that means its close to impossible to detect actual wait of a fence. Instead, this PR implements a prepass which scans input command lists and tries to "guess" which packets act as fences and which not.

Possible packets that can write labels are EventWriteEos, EventWriteEop, WriteData (GFX) and ReleaseMem (ACB). There is a simple heuristic where if the label of a signal packet is waited by the GPU with WaitRegMem packet it is considered a GPU->GPU sync (something akin to pipeline barriers). It is in fact possible for a label to act both as fence and pipeline barrier so the heuristic can fail but its very unlikely.

Deferring read protections allows for some powerful optimizations, 2 of which are implemented here.

Rewind indirect patch

The rewind packet has a misleading name, as it implies execution going back somewhere, but what it does is tell CP to drop all prefetched packets and reload them from memory. It is almost exclusively used for command list self modification (from a compute shader for example), which driveclub does for a dozen dispatches at the start of the frame. It uses a compute shader to patch the dimentions of the DispatchDirect PM4 packet before executing it. Why it didnt use an indirect dispatch I'm not sure.

Before readbacks this lead to launching a dispatch with garbage (often huge) dimensions, freezing the GPU. Readbacks, on the other hand, fixed it by read protecting the memory and flushing the modified data. That works but is very expensive; around a dozen flushes, one per patched dispatch.

Defering read protections allows emulator to reach rewind packet before a flush. Then the emulator can scan the pending GPU ranges inside the current command list, check they are dispatch dimention patches and convert the direct dispatch into an indirect dispatch. The latter reads dimentions from GPU buffers avoiding need for flushing memory.

Preemptive buffer downloads

This optimization is a lot more general then above one and should affect all games that rely on readbacks. It is possible to implement without CPU fence detection, but it makes the implementation more efficient because the emulator can batch preemptive download copies upon reaching the fence.

The idea is to track how many times a page has been flushed and if that number exceeds a threshold, any future GPU data inside it will be copied to host asynchronously. If a flush is triggered, the GPU thread simply has to wait for the GPU to finish and copy data to guest memory. The advatange here is the reduction or (in certain cases) elimination of the wait time, as GPU likely has had time to catch up to host. In addition, once the wait has been done, the rest of preemptive downloads become "free" and don't need further stalls.

Add config option to control aggressiveness of fence detection
Revoke preempt status from pages that stop being flushed

Avoids flushing GPU data on Rewind and enables driveclub to work properly without readbacks

Scheduler wait can happen inside stream buffer which is used by DMA inside a defered action

bigol83 · 2025-08-08T20:17:31Z

Quickly checked Bloodborne, for me there is a 10-15 fps improvement in this PR compared to master

squidbus · 2025-08-08T21:37:40Z

For your build error, need to use std::bit_cast

GHU7924 · 2025-08-08T21:51:54Z

@raphaelthegreat , tell me how to test this PR correctly and what options should be activated, because I don’t quite understand how to do this and give you the necessary information.

Should all games be tested with options Enable Readbacks and Enable Readbacks Linear Images enabled?
How can I even tell if a particular game requires this option? (I don't know for sure, but there may be games that don't care about this option)

I don't provide logs yet because I might be doing something wrong, but I will report bugs. I noticed some unusual behavior in 4 of my games:

Bloodborne

The problem that exists on Intel processors has worsened (Without readbacks enabled).

If you enable the two options that I mentioned above, the picture becomes like this (in Main the game also looks like this for me in this place).

TLG

Without readbacks enabled lines Continue and Options were not displayed in the menu, but with the options enabled, these lines were.

I also get a crash when I try to pull the second spear out of Trico (the very beginning of the game), this might be worth testing further.

inFAMOUS™ First Light

An error has returned (Without readbacks enabled):

[Debug] <Critical> vk_presenter.cpp:646 operator(): Assertion Failed!
Device lost during waiting for a frame

With Readbacks options I can get into the game, but I get this picture:

The Order: 1886

Regardless of whether reverse reads are enabled or not, the game returns this error:

[Debug] <Critical> vk_scheduler.cpp:166 operator(): Assertion Failed!
Device lost during submit

raphaelthegreat · 2025-08-08T21:57:30Z

@GHU7924 Testing of this PR should be with readbacks enabled (a lot of code atm assumes readbacks is on, so with readbacks off some stuff might break a little until I address it). Linear image readback option is not affected here and should only be enabled if its needed. Make sure to only compare to main build, don't just report bugs that you see, that is not helpful, you need to report bugs that this PR causes on its own.

StevenMiller123 · 2025-08-08T22:35:07Z

No improvements or regressions in my titles, with a small performance improvement across the board in titles I tested.
This was tested before the most recent update with the swap to bitcasts, not sure if that impacted performance in any meaningful way.

Main:

PR:

CUSA02320 PR.log
CUSA02320.log
CUSA00663.zip
CUSA00663 PR.zip

Randomuser8219 · 2025-08-08T23:21:50Z

Driveclub now works without readbacks.

coolllman · 2025-08-09T07:47:26Z

Main

Pr

Driveclub , TLG fps boost with readbacks)
Spec: i7 11700, rtx 3060

raphaelthegreat · 2025-08-09T08:25:42Z

By how much?

rafael-57 · 2025-08-09T09:09:43Z

@raphaelthegreat from the screenshots he posted it looks like 30FPS in main, 42FPS with your PR, in driveclub. Not sure if that's valid because one is during the day and another during the night

Anyway, here's my experience. At least on my PC, readbacks still seem CPU bottlenecked, even with a 5800X3D. RAM is at 3200Mhz.

I see CPU usage ranging from 80 to 98% on cores and jumping between cores.
GPU usage is never close to full usage even with Bloodborne at 4K. Unlike main without readbacks which easily stresses 98% of the GPU.

(ignore the dirty tag, I just *2 memory to run 4K)

Performance seems very very slightly improved (I get 14-16fps in this area on main, 15-18 with this PR).

If you have time, I am available to run tracy on this with your guidance :-)

raphaelthegreat · 2025-08-09T09:13:27Z

Running the game at 4K significantly increases the cost of readbacks so its not too surprising. Do you get a larger boost at 1080p?

rafael-57 · 2025-08-09T09:20:10Z

It's definetely more significant!

1080p main:

1080p PR:

Not strictly relevant to this PR but looking back at the empty bridge I even get 37-41fps, 4k just stayed the same instead. I didn't realize the resolution would increase the redback cost

coolllman · 2025-08-09T09:52:37Z

Uncharted 3
Main

Screen_Recording_20250809_144946_Moonlight.mp4

Pr

Screen_Recording_20250809_145200_Moonlight.mp4

Gray character models in last pr, before 34892a2 same as main.

raphaelthegreat · 2025-08-09T10:47:57Z

I dont see the gray character models
EDIT: Actually yes I can see them

coolllman · 2025-08-09T10:52:57Z

My settings: Dma=false, readbacks=true

raphaelthegreat · 2025-08-09T10:54:53Z

With Dma=true, readbacks=false how does it look? Naughty Dog games technically dont abide by the fence promise because the flush is for SRT buffer which should work with DMA

coolllman · 2025-08-09T10:58:18Z

With dma=true i don't go in the menu

raphaelthegreat · 2025-08-09T11:23:10Z

I suspect this game might be incompatible with fence detection because it lies about the sizes of storage buffers (which causes the clamp size spam), causing large areas to be marked as GPU modified and emulator thinking it can't write yet. I might be able to think of something but strict readbacks are the only sure way to ensure proper sync

yaya54840 · 2025-08-09T11:45:04Z

Hi
With this pr, Drive Club works for me at 15 fps without any bugs.
I'm attaching my log if it helps.
shad_log.zip

raphaelthegreat · 2025-08-09T13:03:02Z

@coolllman If you add fenceDetection = 0 under [GPU] section in config.toml with newest commit does it fix the issue in Uncharted

raphaelthegreat · 2025-08-09T13:04:23Z

PS: I'm open to suggestions about what the setting should be called, that name might be confusing to users but I couldn't think of anything else. Basically a setting about how strict readbacks are (0 is most strict like on main, 1 is with the fence detection so less strict)

bigol83 · 2025-08-11T10:48:33Z

Does the stutter exist with readbackAccuracy = 2 ?

With readback accuracy to 2 it behaves just like on main, doesn't seem to be a noticeable stutter, but the game freezes during loading, maybe because performance is too low.

With readback accuracy set to 1 there is stutter too.

raphaelthegreat · 2025-08-11T11:07:55Z

@rafael-57 (or anyone else familiar with tracy) can you profile the stutters that occur with the newer accuracy?

raphaelthegreat · 2025-08-11T11:18:46Z

@coolllman Please enable validation in config and send new log

bigol83 · 2025-08-11T12:09:25Z

I played Bloodborne a bit more with readback accuracy at 0 and unfortunately vertex explosion is still there even if it's much less frequent.

Missake212 · 2025-08-11T13:19:28Z

After testing for a while I managed to find a big problem, not only did I get a vertex explosion but the colors were all weird (testing methodology was simply going to the hunter's dream and to cathedral ward back and forth hoping to see a vertex explosion, on one of those warps the colors looked like this).

CUSA03173.log

raphaelthegreat · 2025-08-11T13:46:32Z

Does this happen only with low accuracy or also with high?

Missake212 · 2025-08-11T13:47:13Z

I will try with high and get back to you.

Missake212 · 2025-08-11T14:00:31Z

Looks like it does also happen with high.

CUSA03173.log

StevenMiller123 · 2025-08-11T14:05:50Z

People were reporting a similar color bug with LNDF's #3396 as well, maybe this is actually a regression in main?

Missake212 · 2025-08-11T14:39:08Z

People were reporting a similar color bug with LNDF's #3396 as well, maybe this is actually a regression in main?

I tried to replicate it on main with Readbacks enabled but I nearly always hang in loading screen when going back and forth between the dream and cathedral wards (I also got the softlock a few times with high accuracy on this PR, didn't get any with low). Also for LNDF's PR it seems like things go black, here colors are just wrong so I'd say this is a bit different.

I also got vertex explosions (not sure if you can call them that since they didn't happen on NPCs this time) with High accuracy so not sure what's wrong there.

But also next warp after that vertex explosion that wasn't created by an NPC colors were bugged out so perhaps it really is a regression from the GC PR I can't tell.

raphaelthegreat · 2025-08-11T14:51:40Z

The GC PR didn't use the Dirty flag in SafeToDownload, so small (32x32 BC7 images or less) images that are marked with MaybeCpuDirty could be downloaded incorrectly and cause corruption, but I'm not sure

Missake212 · 2025-08-11T14:56:23Z

I've been testing main for the past 15 minutes without readbacks and colors never changed, so that leaves us with a few possibilities:

-Regression from this PR.

-Regression with Readbacks in general (less likely since they've been merged for a while and someone probably would've reported it, but since i'm getting so many hangs in loading screens perhaps no one has tested).

-Regression from the GC PR but only when Readbacks are enabled.

Conclusion seems to be that it only happens with readbacks enabled. It would be nice to have more people testing this (and ideally without changing the emulator's code or using game patches)

Useless log:

CUSA03173.log

raphaelthegreat · 2025-08-11T15:02:04Z

Probably best to try a build right before GC got merged or build locally and comment out the RunGarbageCollector calls here

Missake212 · 2025-08-11T15:20:07Z

Tried a build prior to GC merge and didn't have the color issue, I don't have VS set up so if someone else could test commenting out what Turtle said above to see if GC is indeed the issue it would help me out.

CUSA03173.log

GHU7924 · 2025-08-11T15:35:45Z

I tested the latest Main build.

Test without Readbacks.

With Readbacks.

These are the problems that I personally saw. Apparently there are regressions in Main itself.

coolllman · 2025-08-11T15:42:13Z

Infamous
shad_log.txt

Update: fix last pr, thank you

rafael-57 · 2025-08-11T16:54:59Z

@rafael-57 (or anyone else familiar with tracy) can you profile the stutters that occur with the newer accuracy?

yes! But do I need to log something in particular or do I just run tracy with everything set to default? Also do I pull fix readbacks off or nah?

rafael-57 · 2025-08-11T18:03:09Z

PXL_20250811_180926681.mp4

After warping at least 100 times with readbacks on and garbage collector ON/OFF, I can say I'm seeing some weird stuff with readbacks on even with accuracy 1 and the garbage collector commented.

Not just character faces but models in the level like tombstones and even weapons (see video) are exploding now, even with readbacks 1 which I don't remember seeing before

This is without 4k patches and stock memory. I should say, this branch is very unstable and very prone to crashing when opening the loading screen/warping to an area

EDIT: I'm pretty sure it's not the garbage collector now but something is just off with this PR now

I did not manage to replicate the extreme color issues the others got, all I got were small corruptions even with the GC off

rafael-57 · 2025-08-11T18:21:30Z

Readbacks-POC branch Readbacks accuracy 1 latest commit reverted Garbace Collector commented

Still getting garbage data. Not sure what is going on anymore...

In any case, it looks like every time a single particular asset has a corrupt texture/explodes. Sometimes it's the a particular wood plank texture, sometimes it's the sword exploding, sometimes it's every single column in the area. I'm guessing assets just get corrupted and by chance it happens only on 1 asset at a time per my testing

raphaelthegreat · 2025-08-11T18:31:03Z

Can you keep reverting commits and test a bit to see if something broke it

Missake212 · 2025-08-11T18:57:37Z

I think I got an answer to this, this is an old screenshot I took when testing readbacks a while ago, I think the issue has always been there but it barely happened, and now changing the readbackAccuracy actually exposes it more, I could be wrong but I think this is it.

rafael-57 · 2025-08-11T18:57:39Z

Can you keep reverting commits and test a bit to see if something broke it

Well before this I just tested this morning right before "fix readbacks off" and everything was fine, and then I've also reverted it.

I've just got this in main too, with readbacks on.:

I've done a lot of testing with main and readbacks off and I'm not finding any corrupt textures after many warps.

I'm beginning to believe we just tested a lot today and started exposing some inherent issues that were already present with readbacks.

Is readbacks 2 more accurate than current main readbacks? I could test that too

rafael-57 · 2025-08-11T19:02:21Z

My main takeways after losing my sanity testing for 2 hours:

Garbace colector 1 isn't broken
Readbacks aren't perfect, even in main (prone to hang/crash the emu, can still have some corrupt textures and random vertex explosions, but they're rare)
Readbacks to 0 is still a great improvement for BB, greatly reducing the occurence of vertex explosions and with a much lesser performance hit

raphaelthegreat · 2025-08-11T19:09:47Z

Readbacks 2 is the same as on main. The garbage textures seems like it could happen because texture cache relies on a hashing workaround for that, but readbacks on main having vertex explosions sounds weird to me, does it actually happen?

rafael-57 · 2025-08-11T19:31:08Z

Readbacks 2 is the same as on main. The garbage textures seems like it could happen because texture cache relies on a hashing workaround for that, but readbacks on main having vertex explosions sounds weird to me, does it actually happen?

Ok, I was testing with readback accuracy 1 instead of 2. So that could be the cause for vertex explosions on objects (not faces unlike main with readbacks OFF).

I do remember making this comment in the original readbacks PR though:
#2668 (comment)

After finnicking a lot I just got this even in main, whatever it is:
https://github.com/user-attachments/assets/8abced79-ef88-493c-9316-e6296144faef

Anyway I wouldn't go too out of scope with this PR. Existing readbacks are already in main with all their pros and cons, and this improves performance a bit + adds more granular settings.

Would be nice if readbacks were more stable in general though. Would it help if I collect a stack with visualstudio when crashing?

What about tracy? Do I just run it with stock code + readbacks accuracy set to 0?

raphaelthegreat · 2025-08-11T19:36:10Z

Would be nice if readbacks were more stable in general though. Would it help if I collect a stack with visualstudio when crashing?

Yes

What about tracy? Do I just run it with stock code + readbacks accuracy set to 0?

Yes

I suspect the reason of the stutter is the game overwriting some large GPU modified region from CPU which would cause page-by-page flush. I don't like that this PR has these bugs though so I was thinking of shelving it until its fixed

rafael-57 · 2025-08-11T19:48:49Z

Would be nice if readbacks were more stable in general though. Would it help if I collect a stack with visualstudio when crashing?

Yes

What about tracy? Do I just run it with stock code + readbacks accuracy set to 0?

Yes

I suspect the reason of the stutter is the game overwriting some large GPU modified region from CPU which would cause page-by-page flush. I don't like that this PR has these bugs though so I was thinking of shelving it until its fixed

Tracy, started right during warp:
https://drive.google.com/file/d/1Ss5ZT_cMSdsfidoXC0eOThcePE4sh29_/view?usp=sharing

bigol83 · 2025-08-11T20:32:53Z

So testing more with Bloodborne, readback accuracy 0 is the only one that doesn't freeze the game but has big stutters. Best performance by far.

Readback accuracy 1 has better performance compared to main build by 5-10 fps, but it freezes the game

Readback accuracy 2 performance is the same as main build with readbacks enabled and it also freezes the game

The game freezes happen with readback enabled on main too.

raphaelthegreat added 6 commits August 8, 2025 16:16

Basic CPU fence detection

5f8fcc8

liverpool: Patch self modifying DispatchDirect to DispatchIndirect

a37def0

Avoids flushing GPU data on Rewind and enables driveclub to work properly without readbacks

Remove some rewind flushes

7c7b610

buffer_cache: Preemptive downloads of frequently flushed pages

a16a681

vk_scheduler: Remove pending op pop in wait

43ba88b

Scheduler wait can happen inside stream buffer which is used by DMA inside a defered action

amdgpu: Split fence detection code to header

9fb27a5

pm4_cmds: Use bit_cast

5ab3d8e

raphaelthegreat force-pushed the readback-opts branch from a5a0c82 to 5ab3d8e Compare August 8, 2025 21:54

buffer_cache: Attempt to fix readback off

34892a2

video_core: Add fence detection setting

8ed2c07

Fix readbacks off

8f3e184

fabiossussin approved these changes Aug 29, 2025

View reviewed changes

Uh oh!

video_core: Readback optimizations #3404

Are you sure you want to change the base?

video_core: Readback optimizations #3404

Uh oh!

Conversation

raphaelthegreat commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

General idea

Rewind indirect patch

Preemptive buffer downloads

Uh oh!

bigol83 commented Aug 8, 2025

Uh oh!

squidbus commented Aug 8, 2025

Uh oh!

GHU7924 commented Aug 8, 2025

Uh oh!

raphaelthegreat commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StevenMiller123 commented Aug 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Randomuser8219 commented Aug 8, 2025

Uh oh!

coolllman commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raphaelthegreat commented Aug 9, 2025

Uh oh!

rafael-57 commented Aug 9, 2025

Uh oh!

raphaelthegreat commented Aug 9, 2025

Uh oh!

rafael-57 commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coolllman commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raphaelthegreat commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coolllman commented Aug 9, 2025

Uh oh!

raphaelthegreat commented Aug 9, 2025

Uh oh!

coolllman commented Aug 9, 2025

Uh oh!

raphaelthegreat commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yaya54840 commented Aug 9, 2025

Uh oh!

raphaelthegreat commented Aug 9, 2025

Uh oh!

raphaelthegreat commented Aug 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bigol83 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raphaelthegreat commented Aug 11, 2025

Uh oh!

raphaelthegreat commented Aug 11, 2025

Uh oh!

bigol83 commented Aug 11, 2025

Uh oh!

Missake212 commented Aug 11, 2025

Uh oh!

raphaelthegreat commented Aug 11, 2025

Uh oh!

Missake212 commented Aug 11, 2025

Uh oh!

Missake212 commented Aug 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

StevenMiller123 commented Aug 11, 2025

Uh oh!

raphaelthegreat commented Aug 8, 2025 •

edited

Loading

raphaelthegreat commented Aug 8, 2025 •

edited

Loading

StevenMiller123 commented Aug 8, 2025 •

edited

Loading

coolllman commented Aug 9, 2025 •

edited

Loading

rafael-57 commented Aug 9, 2025 •

edited

Loading

coolllman commented Aug 9, 2025 •

edited

Loading

raphaelthegreat commented Aug 9, 2025 •

edited

Loading

raphaelthegreat commented Aug 9, 2025 •

edited

Loading

raphaelthegreat commented Aug 9, 2025 •

edited

Loading

bigol83 commented Aug 11, 2025 •

edited

Loading

Missake212 commented Aug 11, 2025 •

edited

Loading

raphaelthegreat commented Aug 11, 2025 •

edited

Loading

Missake212 commented Aug 11, 2025 •

edited

Loading

coolllman commented Aug 11, 2025 •

edited

Loading

rafael-57 commented Aug 11, 2025 •

edited

Loading

rafael-57 commented Aug 11, 2025 •

edited

Loading

rafael-57 commented Aug 11, 2025 •

edited

Loading

rafael-57 commented Aug 11, 2025 •

edited

Loading

rafael-57 commented Aug 11, 2025 •

edited

Loading

rafael-57 commented Aug 11, 2025 •

edited

Loading

bigol83 commented Aug 11, 2025 •

edited

Loading