Skip to content

Conversation

@Alcpz
Copy link
Contributor

@Alcpz Alcpz commented Jul 22, 2025

The current implementation of how some mul_mats do 8-bit quantization is not very flexible. While exploring other possibilities for a different gemv kernel, I run into the necessity of having a q8_1 tensor in a slightly different format, and that wasn't supported with the current convert_src1_to_q8_1 bool.

The PR refactors quantization kernels to a separate header and:

  • Unifies kernel submission to use sycl::nd_item<1>
  • Rewrites the quantize_q8_1 to have the same structure as the reorder q8_1 kernel
  • Adds exception handling that was ignored in the original code introduced with SYCLomatic.

Performance is unaffected.

Pinging @AD2605 as author of the reorder q8_1 kernel.

@Alcpz Alcpz requested review from Rbiessy and s-Nick July 22, 2025 12:25
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language labels Jul 22, 2025
@Alcpz Alcpz merged commit afc0e89 into ggml-org:master Jul 28, 2025
46 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning SYCL https://en.wikipedia.org/wiki/SYCL - GPU programming language

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants