vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency #12630
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There seems to be a bubble waking up from waitForFences, which costs a few percent performance and also increases variance in performance. This change inserts an "almost_ready" fence when the graph is about 80% complete and we waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting for the final fence to be signaled.
I've seen up to 5% performance improvement from this on NVIDIA/Windows. I'm curious whether it helps on other IHV/OS combinations as well. Yesterday I did some power testing using the NZXT CAM software to measure the CPU power, and this hybrid spin/wait seemed pretty good. Today, my CPU power is higher even at idle and tokens/s is lower (though still an improvement over master), regardless of the spin/wait behavior, 🤷 computers are weird.