Commit 455fd47
Feat/sq packing p1 (#5)
* feat: Add infrastructure for SmarterQuant custom quantization
Implements the following:
- Parsing of `default.smarterquant.json` in llama-quantize and during model loading to define custom quantization strategies for tensors.
- Defines `SmarterQuantTensorInfo` and `SmarterQuantConfigMap` to hold this configuration.
- Modifies `llama-quantize` to:
- Apply column permutation based on the configuration (data manipulation placeholder).
- Select block-specific compression types for the first four 256-column blocks (quantization placeholder).
- Write SmarterQuant configuration (permutation, block types, enabled flag) as GGUF metadata.
- Modifies model loading to read this GGUF metadata and populate the `llama_model` with the SmarterQuant configuration.
- Adds documentation for the `default.smarterquant.json` format.
Note: The core logic for packing custom-quantized blocks in `llama-quantize` and for dequantizing/unpermuting these blocks in `ggml.c` (inference) is not yet implemented. Models quantized with these features will not run correctly until that is complete.
* docs: Add todo.txt outlining remaining SmarterQuant work
This file details the pending implementation tasks for the core
quantization and dequantization logic required to make the
SmarterQuant feature fully functional.
* Update todo.txt
* Implement SmarterQuant custom block quantization packing
- Add llama_tensor_quantize_smarter_blocks to handle per-segment quantization based on SmarterQuantTensorInfo.
- Integrate this into llama_model_quantize_impl.
- Ensure imatrix is permuted along with f32_data before quantization.
- Correct GGUF metadata handling for SmarterQuant tensors (base type set to compression_types[3]).
- Numerous compilation fixes related to includes, type definitions, and removal of old C-style SmarterQuant parsing code.
- Initial verification confirms correct packed data size calculation for the new SmarterQuant path.
---------
Co-authored-by: google-labs-jules[bot] <161369871+google-labs-jules[bot]@users.noreply.github.com>1 parent 4f9bbc4 commit 455fd47
File tree
10 files changed
+769
-351
lines changed- docs
- gguf-py/examples
- src
10 files changed
+769
-351
lines changedLarge diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
10 | | - | |
11 | | - | |
12 | | - | |
13 | | - | |
14 | | - | |
15 | | - | |
16 | | - | |
17 | | - | |
18 | | - | |
19 | | - | |
20 | | - | |
21 | | - | |
22 | | - | |
23 | | - | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
30 | 42 | | |
31 | 43 | | |
32 | 44 | | |
33 | 45 | | |
34 | 46 | | |
35 | 47 | | |
36 | | - | |
| 48 | + | |
37 | 49 | | |
38 | 50 | | |
39 | | - | |
| 51 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
| 4 | + | |
| 5 | + | |
4 | 6 | | |
5 | 7 | | |
6 | 8 | | |
| |||
487 | 489 | | |
488 | 490 | | |
489 | 491 | | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
| 527 | + | |
| 528 | + | |
490 | 529 | | |
491 | 530 | | |
492 | 531 | | |
| |||
553 | 592 | | |
554 | 593 | | |
555 | 594 | | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
| 620 | + | |
| 621 | + | |
| 622 | + | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
| 626 | + | |
| 627 | + | |
| 628 | + | |
| 629 | + | |
556 | 630 | | |
557 | 631 | | |
558 | 632 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
3 | 7 | | |
4 | 8 | | |
5 | 9 | | |
| |||
12 | 16 | | |
13 | 17 | | |
14 | 18 | | |
| 19 | + | |
| 20 | + | |
15 | 21 | | |
16 | 22 | | |
17 | 23 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
12 | 12 | | |
13 | 13 | | |
14 | 14 | | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
15 | 18 | | |
16 | 19 | | |
17 | 20 | | |
| |||
350 | 353 | | |
351 | 354 | | |
352 | 355 | | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
353 | 360 | | |
354 | 361 | | |
355 | 362 | | |
| |||
0 commit comments