Skip to content

Commit 0e0a92e

Browse files
Fix compile errors (#13)
* Refactor mul_mat for SmarterQuant permuted inference Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk) to operate on SmarterQuant src0 tensors that have permuted columns and per-segment quantization. This involves: - Iterating through src0 segments. - Determining segment-specific quantization types. - On-the-fly quantization of corresponding src1 (activation) segments if src1 is F32. - Performing dot products using the permuted, quantized src0 segments. The resulting dst tensor from this operation is computed in a permuted order (reflecting src0's column permutations' influence on how dst's elements are effectively indexed or should be interpreted). Add a new function `ggml_unpermute_f32_inplace` to unpermute the first dimension of an F32 tensor. Update `ggml_compute_forward_mul_mat` to: - Correctly manage src1 data preparation, ensuring that the SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk` receives F32 src1 data for its internal per-segment quantization. - Call `ggml_unpermute_f32_inplace` on the dst tensor after the matrix multiplication if src0 was SmarterQuant processed, to unpermute the result vector as per the requirements. * Refactor mul_mat for SmarterQuant permuted inference Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk) to operate on SmarterQuant src0 tensors that have permuted columns and per-segment quantization. This involves: - Iterating through src0 segments. - Determining segment-specific quantization types. - On-the-fly quantization of corresponding src1 (activation) segments if src1 is F32. - Performing dot products using the permuted, quantized src0 segments. The resulting dst tensor from this operation is computed in a permuted order. Add a new function `ggml_unpermute_f32_inplace` to unpermute the first dimension of an F32 tensor. Update `ggml_compute_forward_mul_mat` to: - Correctly manage src1 data preparation, ensuring that the SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk` receives F32 src1 data for its internal per-segment quantization. - Call `ggml_unpermute_f32_inplace` on the dst tensor after the matrix multiplication if src0 was SmarterQuant processed, to unpermute the result vector. * Fix compilation errors in ggml-cpu and test-smarterquant - Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE. - Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata. - Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument. - Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c. - Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors. * Fix compilation errors in ggml-cpu and test-smarterquant - Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE. - Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata. - Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument. - Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c. - Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>
1 parent c499d1b commit 0e0a92e

File tree

2 files changed

+232
-77
lines changed

2 files changed

+232
-77
lines changed

0 commit comments

Comments
 (0)