-
Notifications
You must be signed in to change notification settings - Fork 13.4k
ggml-cpu: implement MXFP4 SIMD for s390x #16193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
Signed-off-by: Aaron Teo <[email protected]>
This reverts commit 1fe5572. Signed-off-by: Aaron Teo <[email protected]>
ggml/src/ggml-cpu/arch/s390/quants.c
Outdated
| #pragma GCC unroll 8 | ||
| for (; ib < nb; ++ib) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This unroll seems unnecessary, since this loop should only have zero or one iterations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch! Fixed in latest commit
Signed-off-by: Aaron Teo <[email protected]>
|
AMX CI has been failing with the same failure, ignoring it. Pushing to |
* ggml-cpu: impl mxfp4 s390x Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: missing s = sumf Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix incorrect kval_mxfp4 type Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: rework mxfp4 Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: missing delta calc Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix typo Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix typo for vec_splats Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: expand to 2 blocks per loop Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: add unroll to boost perf Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: back to 1 block per loop to test perf Signed-off-by: Aaron Teo <[email protected]> * Revert "ggml-cpu: back to 1 block per loop to test perf" This reverts commit 1fe5572. Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: rm unroll from single block Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
* ggml-cpu: impl mxfp4 s390x Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: missing s = sumf Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix incorrect kval_mxfp4 type Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: rework mxfp4 Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: missing delta calc Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix typo Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix typo for vec_splats Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: expand to 2 blocks per loop Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: add unroll to boost perf Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: back to 1 block per loop to test perf Signed-off-by: Aaron Teo <[email protected]> * Revert "ggml-cpu: back to 1 block per loop to test perf" This reverts commit 1fe5572. Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: rm unroll from single block Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
* ggml-cpu: impl mxfp4 s390x Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: missing s = sumf Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix incorrect kval_mxfp4 type Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: rework mxfp4 Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: missing delta calc Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix typo Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: fix typo for vec_splats Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: expand to 2 blocks per loop Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: add unroll to boost perf Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: back to 1 block per loop to test perf Signed-off-by: Aaron Teo <[email protected]> * Revert "ggml-cpu: back to 1 block per loop to test perf" This reverts commit 1fe5572. Signed-off-by: Aaron Teo <[email protected]> * ggml-cpu: rm unroll from single block Signed-off-by: Aaron Teo <[email protected]> --------- Signed-off-by: Aaron Teo <[email protected]>
This pull request integrates the SIMD instruction set for MXFP4 on the s390x platform. We notice a 159.52% performance improvement for Prompt Processing, and 136.90% for Token Generation.
Before SIMD Benchmark
After SIMD Benchmark
Verification
To ensure that this implementation did not break anything, the SIMD instruction set has been tested on the following models:
test-quantize-fnsIt is noted that
q8_0is currently failing, fixed in #15925. MXFP4 itself is passing.Note
Tests were conducted on an IBM z17 Mainframe with 40 IFLs (cores) and 128 GB Memory on a shared R&D LPAR.
Please review this pull request and consider merging into the main repository. Thank you!