Fix compile warnings #14393

708-145 · 2025-06-26T11:32:26Z

Make sure to read the contributing guidelines before submitting a PR

* feat: Add infrastructure for SmarterQuant custom quantization Implements the following: - Parsing of `default.smarterquant.json` in llama-quantize and during model loading to define custom quantization strategies for tensors. - Defines `SmarterQuantTensorInfo` and `SmarterQuantConfigMap` to hold this configuration. - Modifies `llama-quantize` to: - Apply column permutation based on the configuration (data manipulation placeholder). - Select block-specific compression types for the first four 256-column blocks (quantization placeholder). - Write SmarterQuant configuration (permutation, block types, enabled flag) as GGUF metadata. - Modifies model loading to read this GGUF metadata and populate the `llama_model` with the SmarterQuant configuration. - Adds documentation for the `default.smarterquant.json` format. Note: The core logic for packing custom-quantized blocks in `llama-quantize` and for dequantizing/unpermuting these blocks in `ggml.c` (inference) is not yet implemented. Models quantized with these features will not run correctly until that is complete. * docs: Add todo.txt outlining remaining SmarterQuant work This file details the pending implementation tasks for the core quantization and dequantization logic required to make the SmarterQuant feature fully functional. * Update todo.txt * Implement SmarterQuant custom block quantization packing - Add llama_tensor_quantize_smarter_blocks to handle per-segment quantization based on SmarterQuantTensorInfo. - Integrate this into llama_model_quantize_impl. - Ensure imatrix is permuted along with f32_data before quantization. - Correct GGUF metadata handling for SmarterQuant tensors (base type set to compression_types[3]). - Numerous compilation fixes related to includes, type definitions, and removal of old C-style SmarterQuant parsing code. - Initial verification confirms correct packed data size calculation for the new SmarterQuant path. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* Checkpoint: Refactor SmarterQuantTensorInfo and add headers - Created C-compatible SmarterQuantTensorInfo in ggml-smarterquant-types.h - Updated ggml.h, ggml-cpu.c, llama-quant.h, llama-quant.cpp, llama-model-loader.cpp, and llama-model.cpp to use the new struct. - Added missing C++ headers and forward declarations to llama-quant.cpp in an attempt to resolve compilation errors. Note: Codebase is not currently compiling due to issues in llama-quant.cpp and an incorrect CMake build path used in the last attempt. User will address compilation issues next. * Fix compilation issues and implement SmarterQuant stubs - Resolved various compilation errors in llama-quant.cpp related to includes, function definitions, and SmarterQuant logic. - Implemented parsing for SmarterQuant JSON configuration in `load_smarter_quant_config`. - Added a basic serial implementation for `llama_tensor_quantize_smarter_blocks`. - Provided functional stubs for quantization helper functions within `llama-quant.cpp`. - Ensured the public `llama_model_quantize` API correctly calls the implementation in `llama-quant.cpp`. - Fixed a memory leak by adding a destructor to `llama_model` to free SmarterQuant permutation data. - Verified that `ggml-cpu.c` and `llama-model.cpp` changes for SmarterQuant dequantization compile. - The main library and all example tools now compile and link successfully. * feat: Implement SmarterQuant numerical correctness tests This commit introduces a new test suite for the SmarterQuant functionality to verify the numerical correctness of the custom block quantization and dequantization logic. Key changes: - Added `tests/test-smarterquant.cpp` with a test case that: - Uses a sample F32 tensor with mixed quantization types (Q4_0, Q5_1, Q8_0, Q2_K). - Applies column permutation. - Quantizes using `llama_tensor_quantize_smarter_blocks`. - Dequantizes using `ggml_get_rows_smarterquant`. - Verifies the output against the original data. - Updated `tests/CMakeLists.txt` to build the new test. - Made `llama_tensor_quantize_smarter_blocks` in `src/llama-quant.cpp` non-static and added its declaration to `src/llama-quant.h`. - Made `ggml_get_rows_smarterquant` in `ggml/src/ggml-cpu/ggml-cpu.c` non-static to allow direct testing. - The implemented test passes, confirming the core CPU implementation of SmarterQuant (Tasks 1 and 2 from todo.txt) is working as expected for the tested scenario. * feat: Implement SmarterQuant numerical correctness tests and update todo This commit introduces a new test suite for the SmarterQuant functionality to verify the numerical correctness of the custom block quantization and dequantization logic. It also updates todo.txt to reflect this progress. Key changes: - Added `tests/test-smarterquant.cpp` with a test case that: - Uses a sample F32 tensor with mixed quantization types (Q4_0, Q5_1, Q8_0, Q2_K). - Applies column permutation. - Quantizes using `llama_tensor_quantize_smarter_blocks`. - Dequantizes using `ggml_get_rows_smarterquant`. - Verifies the numerical output against the original F32 data. - Updated `tests/CMakeLists.txt` to build the new test. - Made `llama_tensor_quantize_smarter_blocks` in `src/llama-quant.cpp` non-static and added its declaration to `src/llama-quant.h`. - Made `ggml_get_rows_smarterquant` in `ggml/src/ggml-cpu/ggml-cpu.c` non-static to allow direct testing by the new test suite. - The implemented test passes, confirming the core CPU implementation of SmarterQuant (Tasks 1 and 2 from todo.txt) is working as expected for the tested scenario. - Updated `todo.txt` to mark the CPU numerical correctness testing as DONE and outline further potential test enhancements. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* Implement remaining SmarterQuant JSON and GGUF tests - Enhanced `tests/test-smarterquant.cpp` with edge cases for tensor column counts (128, 300, 512, 768) and different permutation patterns (identity, few swaps). - Added `tests/test-smarterquant-gguf.cpp` for end-to-end GGUF testing, including metadata writing/reading and numerical verification through the quantization and model loading pipeline. - Updated `todo.txt` to reflect test completion. * docs: Analyze memory impact of SmarterQuant unpermutation buffer Documents the memory usage of the temporary F32 buffer used during the unpermutation step in `ggml_get_rows_smarterquant`. The buffer is stack-allocated (`alloca`) with size `n_cols * sizeof(float)`. For typical model dimensions, this is a minor memory footprint (e.g., 16-32KB) and is short-lived. A potential concern for extremely large column counts is noted, though not typical for current LLM weights. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

* Refactor mul_mat for SmarterQuant permuted inference Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk) to operate on SmarterQuant src0 tensors that have permuted columns and per-segment quantization. This involves: - Iterating through src0 segments. - Determining segment-specific quantization types. - On-the-fly quantization of corresponding src1 (activation) segments if src1 is F32. - Performing dot products using the permuted, quantized src0 segments. The resulting dst tensor from this operation is computed in a permuted order (reflecting src0's column permutations' influence on how dst's elements are effectively indexed or should be interpreted). Add a new function `ggml_unpermute_f32_inplace` to unpermute the first dimension of an F32 tensor. Update `ggml_compute_forward_mul_mat` to: - Correctly manage src1 data preparation, ensuring that the SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk` receives F32 src1 data for its internal per-segment quantization. - Call `ggml_unpermute_f32_inplace` on the dst tensor after the matrix multiplication if src0 was SmarterQuant processed, to unpermute the result vector as per the requirements. * Refactor mul_mat for SmarterQuant permuted inference Modify matrix multiplication (ggml_compute_forward_mul_mat_one_chunk) to operate on SmarterQuant src0 tensors that have permuted columns and per-segment quantization. This involves: - Iterating through src0 segments. - Determining segment-specific quantization types. - On-the-fly quantization of corresponding src1 (activation) segments if src1 is F32. - Performing dot products using the permuted, quantized src0 segments. The resulting dst tensor from this operation is computed in a permuted order. Add a new function `ggml_unpermute_f32_inplace` to unpermute the first dimension of an F32 tensor. Update `ggml_compute_forward_mul_mat` to: - Correctly manage src1 data preparation, ensuring that the SmarterQuant path in `ggml_compute_forward_mul_mat_one_chunk` receives F32 src1 data for its internal per-segment quantization. - Call `ggml_unpermute_f32_inplace` on the dst tensor after the matrix multiplication if src0 was SmarterQuant processed, to unpermute the result vector. * Fix compilation errors in ggml-cpu and test-smarterquant - Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE. - Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata. - Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument. - Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c. - Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors. * Fix compilation errors in ggml-cpu and test-smarterquant - Defined GGML_MAX_BLOCK_SIZE in ggml-cpu.c and used it instead of the undeclared GGML_MAX_TYPE_SIZE. - Corrected a typo in ggml_compute_forward_mul_mat, changing wdata_src1_quantized to wdata. - Fixed an incorrect function call to quantize_src1_segment by removing an extra NULL argument. - Added a forward declaration for ggml_unpermute_f32_inplace in ggml-cpu.c. - Included <cinttypes> in tests/test-smarterquant.cpp to resolve PRId64 undeclared identifier errors. --------- Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>

- ggml/src/ggml-cpu/ggml-cpu.c: - Fix cast discarding 'const' qualifier. - Remove unused variables 'bs' and 'nbw0'. - src/llama-quant.cpp: - Fix comparison of integer expressions of different signedness. - Remove unused variables 'thread_src', 'thread_dst_char', and 'total_size_written'.

- ggml/src/ggml.c: - Remove braces around scalar initializer for `sq_info`. - Explicitly initialize the `padding` field. - ggml/src/ggml-cpu/ggml-cpu.c: - Change `src1_segment_prepared_data` to `const void *` to fix assignment discards 'const' qualifier warning.

708-145 and others added 10 commits April 14, 2025 14:53

SmartQuant from default.smartquant.json

e1ffb0a

add default smarterquant example

4f9bbc4

Update todo.txt

6de9577

Update todo.txt

c499d1b

708-145 closed this Jun 26, 2025

github-actions bot added documentation Improvements or additions to documentation testing Everything test related python python script changes ggml changes relating to the ggml tensor library for machine learning labels Jun 26, 2025

708-145 deleted the fix-compile-warnings branch June 26, 2025 11:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix compile warnings #14393

Fix compile warnings #14393

Uh oh!

708-145 commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fix compile warnings #14393

Fix compile warnings #14393

Uh oh!

Conversation

708-145 commented Jun 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant