Commit 0e0a92e
Fix compile errors (#13)
* Refactor mul_mat for SmarterQuant permuted inference
Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk)
to operate on SmarterQuant src0 tensors that have permuted columns
and per-segment quantization. This involves:
- Iterating through src0 segments.
- Determining segment-specific quantization types.
- On-the-fly quantization of corresponding src1 (activation)
segments if src1 is F32.
- Performing dot products using the permuted, quantized src0 segments.
The resulting dst tensor from this operation is computed in a permuted
order (reflecting src0's column permutations' influence on how dst's
elements are effectively indexed or should be interpreted).
Add a new function `ggml_unpermute_f32_inplace` to unpermute the
first dimension of an F32 tensor.
Update `ggml_compute_forward_mul_mat` to:
- Correctly manage src1 data preparation, ensuring that the
SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk`
receives F32 src1 data for its internal per-segment quantization.
- Call `ggml_unpermute_f32_inplace` on the dst tensor after the
matrix multiplication if src0 was SmarterQuant processed, to
unpermute the result vector as per the requirements.
* Refactor mul_mat for SmarterQuant permuted inference
Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk)
to operate on SmarterQuant src0 tensors that have permuted columns
and per-segment quantization. This involves:
- Iterating through src0 segments.
- Determining segment-specific quantization types.
- On-the-fly quantization of corresponding src1 (activation)
segments if src1 is F32.
- Performing dot products using the permuted, quantized src0 segments.
The resulting dst tensor from this operation is computed in a permuted
order.
Add a new function `ggml_unpermute_f32_inplace` to unpermute the
first dimension of an F32 tensor.
Update `ggml_compute_forward_mul_mat` to:
- Correctly manage src1 data preparation, ensuring that the
SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk`
receives F32 src1 data for its internal per-segment quantization.
- Call `ggml_unpermute_f32_inplace` on the dst tensor after the
matrix multiplication if src0 was SmarterQuant processed, to
unpermute the result vector.
* Fix compilation errors in ggml-cpu and test-smarterquant
- Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE.
- Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata.
- Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument.
- Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c.
- Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors.
* Fix compilation errors in ggml-cpu and test-smarterquant
- Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE.
- Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata.
- Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument.
- Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c.
- Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors.
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>1 parent c499d1b commit 0e0a92e
2 files changed
+232
-77
lines changed
0 commit comments