-
Notifications
You must be signed in to change notification settings - Fork 13.3k
ggml-cpu: optimize the ggml NORM operation #15953
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c38f290
to
df16d10
Compare
af54a93
to
2853109
Compare
654f1e6
to
fc759be
Compare
Thank you @ggerganov for your review. I applied your suggestions and rebased. |
This looks ready for merge, forgotten? |
Was actually waiting to see if @/slaren had any comments on this since he is the codeowner. But yeah, if no further comments i'll merge it tomorrow morning. |
04d56f9
to
31eb135
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor whitespace cleanup
7dae677
to
7e986ec
Compare
rename function add endif macro comment Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Aaron Teo <[email protected]>
@duduta Please re-apply the whitespace cleanup suggestions I just unresolved, then we are good to merge I think. |
* master: (113 commits) webui: updated the chat service to only include max_tokens in the req… (ggml-org#16489) cpu : optimize the ggml NORM operation (ggml-org#15953) server : host-memory prompt caching (ggml-org#16391) No markdown in cot (ggml-org#16483) model-conversion : add support for SentenceTransformers (ggml-org#16387) ci: add ARM64 Kleidiai build and test support (ggml-org#16462) CANN: Improve ACL graph matching (ggml-org#16166) kleidiai: kernel interface refactoring (ggml-org#16460) [SYCL] refactor soft_max, add soft_max_back (ggml-org#16472) model: EmbeddingGemma Adding Support for SentenceTransformers Dense Modules (ggml-org#16367) refactor: centralize CoT parsing in backend for streaming mode (ggml-org#16394) Disable CUDA host buffers on integrated GPUs (ggml-org#16308) server : fix cancel pending task (ggml-org#16467) metal : mark FA blocks (ggml-org#16372) server : improve context checkpoint logic (ggml-org#16440) ggml webgpu: profiling, CI updates, reworking of command submission (ggml-org#16452) llama : support LiquidAI LFM2-MoE hybrid model (ggml-org#16464) server : add `/v1/health` endpoint (ggml-org#16461) webui : added download action (ggml-org#13552) (ggml-org#16282) presets : fix pooling param for embedding models (ggml-org#16455) ...
Hello, it seems like this PR causes degradation when used with TTS.cpp running Kokoro. Reverting all changes done to Using an Intel i9 13980hx CPU (avx2 enabled, no avx512) Audio Before: before.mp4Audio After: after.mp4 |
@LostRuins Should be fixed in #16558 |
This reverts commit 20678dd.
Thanks, seems to be working from a quick test |
Sorry, @LostRuins , thanks @ggerganov for fixing this |
all good, i'll let you know if any other issues come up. |
* ggml-cpu: optimize norm operation to use intrinsics or Accelerate rename function add endif macro comment Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Aaron Teo <[email protected]> * implement s390x SIMD suggested by @taronaeo * add TODO comment * tidy up spaces --------- Co-authored-by: Georgi Gerganov <[email protected]> Co-authored-by: Aaron Teo <[email protected]>
This PR optimizes the ggml norm operation.
The implementation of ggml_vec_centered_variance_f32 mirrors
ggml_vec_soft_max_f32 for consistency.
I tested on an AVX2 ISA.
Device description: Intel(R) Core(TM) i7-4750HQ CPU @ 2.00GHz
Device memory: 16384 MB (16384 MB free)
Results from
test-backend-ops perf -b CPU -o NORM
BEFORE:
AFTER: