Commit 0e0a92e

and

authored

Fix compile errors (#13)

* Refactor mul_mat for SmarterQuant permuted inference Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk) to operate on SmarterQuant src0 tensors that have permuted columns and per-segment quantization. This involves: - Iterating through src0 segments. - Determining segment-specific quantization types. - On-the-fly quantization of corresponding src1 (activation) segments if src1 is F32. - Performing dot products using the permuted, quantized src0 segments. The resulting dst tensor from this operation is computed in a permuted order (reflecting src0's column permutations' influence on how dst's elements are effectively indexed or should be interpreted). Add a new function `ggml_unpermute_f32_inplace` to unpermute the first dimension of an F32 tensor. Update `ggml_compute_forward_mul_mat` to: - Correctly manage src1 data preparation, ensuring that the SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk` receives F32 src1 data for its internal per-segment quantization. - Call `ggml_unpermute_f32_inplace` on the dst tensor after the matrix multiplication if src0 was SmarterQuant processed, to unpermute the result vector as per the requirements. * Refactor mul_mat for SmarterQuant permuted inference Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk) to operate on SmarterQuant src0 tensors that have permuted columns and per-segment quantization. This involves: - Iterating through src0 segments. - Determining segment-specific quantization types. - On-the-fly quantization of corresponding src1 (activation) segments if src1 is F32. - Performing dot products using the permuted, quantized src0 segments. The resulting dst tensor from this operation is computed in a permuted order. Add a new function `ggml_unpermute_f32_inplace` to unpermute the first dimension of an F32 tensor. Update `ggml_compute_forward_mul_mat` to: - Correctly manage src1 data preparation, ensuring that the SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk` receives F32 src1 data for its internal per-segment quantization. - Call `ggml_unpermute_f32_inplace` on the dst tensor after the matrix multiplication if src0 was SmarterQuant processed, to unpermute the result vector. * Fix compilation errors in ggml-cpu and test-smarterquant - Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE. - Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata. - Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument. - Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c. - Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors. * Fix compilation errors in ggml-cpu and test-smarterquant - Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE. - Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata. - Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument. - Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c. - Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

1 parent c499d1b commit 0e0a92eCopy full SHA for 0e0a92e

2 files changed

+232

-77

lines changed

ggml/src/ggml-cpu
- ggml-cpu.c
tests
- test-smarterquant.cpp

2 files changed

+232

-77

lines changed

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 0e0a92e

2 files changed

2 files changed

File tree

2 files changed

2 files changed

0 commit comments