vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency #12630

jeffbolznv · 2025-03-28T16:10:57Z

There seems to be a bubble waking up from waitForFences, which costs a few percent performance and also increases variance in performance. This change inserts an "almost_ready" fence when the graph is about 80% complete and we waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting for the final fence to be signaled.

I've seen up to 5% performance improvement from this on NVIDIA/Windows. I'm curious whether it helps on other IHV/OS combinations as well. Yesterday I did some power testing using the NZXT CAM software to measure the CPU power, and this hybrid spin/wait seemed pretty good. Today, my CPU power is higher even at idle and tokens/s is lower (though still an improvement over master), regardless of the spin/wait behavior, 🤷 computers are weird.

jeffbolznv · 2025-03-28T19:11:20Z

ggml/src/ggml-vulkan/ggml-vulkan.cpp

+            exit(1);
+        }
+        for (uint32_t i = 0; i < 100; ++i) {
+            _mm_pause();


I didn't expect this to pass CI, but I guess we only build Vulkan for x64 in CI. I'll generalize this, but it doesn't block perf testing.

There seems to be a bubble waking up from waitForFences, which costs a few percent performance and also increased variance in performance. This change inserts an "almost_ready" fence when the graph is about 80% complete and we waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting for the final fence to be signaled.

netrunnereve · 2025-04-01T00:58:03Z

I ran some quick tests on this using my usual setup and performance is pretty much the same as master. Maybe there's a <1% difference but I can't tell.

0cc4m

I see a small positive effect on all vendors, LGTM

…gml-org#12630) There seems to be a bubble waking up from waitForFences, which costs a few percent performance and also increased variance in performance. This change inserts an "almost_ready" fence when the graph is about 80% complete and we waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting for the final fence to be signaled.

jeffbolznv requested a review from 0cc4m March 28, 2025 16:10

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 28, 2025

jeffbolznv commented Mar 28, 2025

View reviewed changes

jeffbolznv force-pushed the spin_fence branch from 364d556 to 1c33d8e Compare March 28, 2025 19:39

0cc4m approved these changes Apr 2, 2025

View reviewed changes

0cc4m merged commit 74d4f5b into ggml-org:master Apr 4, 2025
48 checks passed

jeffbolznv mentioned this pull request Apr 16, 2025

Misc. bug: Vulkan performance depends on thread priority #12976

Open

jakexcosme mentioned this pull request Oct 22, 2025

Misc. bug: Vulkan performance depends on thread priority COG-GTM/llama.cpp#205

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency #12630

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency #12630

Uh oh!

jeffbolznv commented Mar 28, 2025

Uh oh!

jeffbolznv Mar 28, 2025

Uh oh!

netrunnereve commented Apr 1, 2025

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency #12630

vulkan: Hybrid waitForFences/getFenceStatus to reduce fence latency #12630

Uh oh!

Conversation

jeffbolznv commented Mar 28, 2025

Uh oh!

jeffbolznv Mar 28, 2025

Choose a reason for hiding this comment

Uh oh!

netrunnereve commented Apr 1, 2025

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants