Skip to content

Conversation

@jeffbolznv
Copy link
Collaborator

There seems to be a bubble waking up from waitForFences, which costs a few percent performance and also increases variance in performance. This change inserts an "almost_ready" fence when the graph is about 80% complete and we waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting for the final fence to be signaled.

I've seen up to 5% performance improvement from this on NVIDIA/Windows. I'm curious whether it helps on other IHV/OS combinations as well. Yesterday I did some power testing using the NZXT CAM software to measure the CPU power, and this hybrid spin/wait seemed pretty good. Today, my CPU power is higher even at idle and tokens/s is lower (though still an improvement over master), regardless of the spin/wait behavior, 🤷 computers are weird.

@jeffbolznv jeffbolznv requested a review from 0cc4m March 28, 2025 16:10
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Mar 28, 2025
exit(1);
}
for (uint32_t i = 0; i < 100; ++i) {
_mm_pause();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't expect this to pass CI, but I guess we only build Vulkan for x64 in CI. I'll generalize this, but it doesn't block perf testing.

There seems to be a bubble waking up from waitForFences, which costs a few
percent performance and also increased variance in performance. This change
inserts an "almost_ready" fence when the graph is about 80% complete and we
waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting
for the final fence to be signaled.
@netrunnereve
Copy link
Collaborator

I ran some quick tests on this using my usual setup and performance is pretty much the same as master. Maybe there's a <1% difference but I can't tell.

Copy link
Collaborator

@0cc4m 0cc4m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see a small positive effect on all vendors, LGTM

@0cc4m 0cc4m merged commit 74d4f5b into ggml-org:master Apr 4, 2025
48 checks passed
timwu pushed a commit to timwu/llama.cpp that referenced this pull request May 5, 2025
…gml-org#12630)

There seems to be a bubble waking up from waitForFences, which costs a few
percent performance and also increased variance in performance. This change
inserts an "almost_ready" fence when the graph is about 80% complete and we
waitForFences for the almost_ready fence and then spin (with _mm_pauses) waiting
for the final fence to be signaled.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants