Skip to content
Draft
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
ba7335e
Refactor variable name
EAddario Aug 19, 2025
4d94911
Add target_bpw parameter
EAddario Aug 19, 2025
cfec404
Update usage
EAddario Aug 19, 2025
5e85fb3
Add parse_target_bpw()
EAddario Aug 19, 2025
e6d55dc
Load activations
EAddario Aug 19, 2025
77b818c
Populate activations_data with imatrix activations if present
EAddario Aug 19, 2025
0edbf0c
Process activations
EAddario Aug 19, 2025
e877474
Process target_bpw parameter
EAddario Aug 19, 2025
1b3d5b5
Populate params
EAddario Aug 19, 2025
a22a9de
Refactor variable and add target_bpw
EAddario Aug 19, 2025
c96b8ee
Add fallback_type enum
EAddario Aug 19, 2025
9adae08
Add is_iq()
EAddario Aug 19, 2025
017945a
Validate if imatrix contains activations
EAddario Aug 19, 2025
92f49ab
Add target_bpw_type() logic
EAddario Aug 19, 2025
1187f6a
Implement bpw_overrides call
EAddario Aug 19, 2025
5aceb9e
Refactor variable names
EAddario Aug 19, 2025
ee05d6b
Update comments
EAddario Aug 19, 2025
f22b309
Avoid division by zero if truncation occurs
EAddario Aug 19, 2025
936294f
Increase precision for error calculation
EAddario Aug 19, 2025
b33abae
Merge branch 'master' into quantize
EAddario Aug 19, 2025
5cd69a6
Add F16/BF16 type
EAddario Aug 20, 2025
69586e2
Add F16/BF16 type
EAddario Aug 20, 2025
29b2dc3
Do not mix K and IQ quants
EAddario Aug 20, 2025
43caadf
Add better fallbacks for IQ mixes
EAddario Aug 20, 2025
52da4a4
Skip if output.weight or type is COPY
EAddario Aug 20, 2025
3f0118d
Fix bias lambda bug
EAddario Aug 20, 2025
b0b33b7
Optimise tensor sampling
EAddario Aug 20, 2025
35ad0fc
Improve error estimation using weighted MSE
EAddario Aug 20, 2025
5ef493e
Exclude embeddings and output tensor
EAddario Aug 21, 2025
95b2ab2
Change error estimate to use normalised weighted MSE
EAddario Aug 21, 2025
e01dad8
Parallelise candidate evaluation
EAddario Aug 21, 2025
887490c
Dequantise sampled rows only
EAddario Aug 21, 2025
9e11f82
Precompute error denominator in estimate_erro()
EAddario Aug 21, 2025
5b6f1e9
General code refactor
EAddario Aug 21, 2025
e6eefa6
Merge branch 'master' into quantize
EAddario Aug 21, 2025
ec0afbe
Include embeddings and output tensors
EAddario Aug 22, 2025
35c1504
Fix byte count for 3d or higher tensors
EAddario Aug 22, 2025
bb0d912
Update comments
EAddario Aug 22, 2025
2f13fee
Parameterise type
EAddario Aug 22, 2025
47cdbe2
Reduce sampling window to speedup process
EAddario Aug 22, 2025
01c927f
Improve pareto efficient candidate selection
EAddario Aug 22, 2025
897decb
Show skipped IQ tensors
EAddario Aug 22, 2025
f05c848
Improve dequantized_buffer fill
EAddario Aug 22, 2025
fea99d0
Refactor and combine lambdas
EAddario Aug 22, 2025
6d17889
Log if override is from tensor-type or from bpw-target
EAddario Aug 22, 2025
9a4b115
Explicitly adding <atomic> include
EAddario Aug 23, 2025
f75265f
Fix typo
EAddario Aug 23, 2025
73124a9
Refactor estimate_error()
EAddario Aug 23, 2025
68ae5e6
Improve list of candidate types
EAddario Aug 23, 2025
decafae
Adjust bias_lambda
EAddario Aug 23, 2025
3856d60
Restrict quant types per family
EAddario Aug 23, 2025
61c0e01
Execute bpw_overrides() only if an imatrix file is provided
EAddario Aug 24, 2025
d4ac210
Improve logging and some minor code refactoring
EAddario Aug 24, 2025
ccaab24
Merge branch 'master' into quantize
EAddario Aug 24, 2025
4286690
Minor comment update
EAddario Aug 26, 2025
0494611
Refactor epsilon into a function-wide variable
EAddario Aug 28, 2025
8df1d00
Add directional scaling
EAddario Aug 28, 2025
66aff8f
Add precise_lambda()
EAddario Aug 28, 2025
556f6b0
Add --precise-lambda option
EAddario Aug 28, 2025
eab8708
Minor factoring for efficiency and correctness
EAddario Aug 30, 2025
09198c4
Merge branch 'master' into quantize
EAddario Aug 30, 2025
7d04050
Merge branch 'master' into quantize
EAddario Sep 6, 2025
04c07b3
Add better control over MSE and directional bias computation
EAddario Sep 10, 2025
f0f07bd
Merge branch 'master' into quantize
EAddario Sep 10, 2025
886536d
Increase error type precision
EAddario Sep 13, 2025
bc8762f
Capture surrounding function name
EAddario Sep 13, 2025
4dff85f
Improve precise_lambda() efficiency
EAddario Sep 13, 2025
7d85993
Minor refactoring
EAddario Sep 13, 2025
12e816b
Replace greedy allocator with lagrangian relaxation
EAddario Sep 13, 2025
2b51606
"Convexify" candidate list
EAddario Sep 13, 2025
8503d59
Increase IQ options
EAddario Sep 13, 2025
c709e1a
Fix MoE tensor estimation
EAddario Sep 14, 2025
9b857e3
Merge branch 'ggml-org:master' into quantize
EAddario Sep 14, 2025
ad70fca
Merge branch 'quantize' of https://github.com/EAddario/llama.cpp into…
EAddario Sep 15, 2025
14fae69
General refactoring
EAddario Sep 20, 2025
a369469
Replace fast_bias() for per slice version and remove precise_bias()
EAddario Sep 20, 2025
ab02bb1
Merge branch 'master' into quantize
EAddario Sep 20, 2025
9e74f83
Replace --bpw-bias flag with --no-bias
EAddario Sep 20, 2025
e8e2aed
Refactor row sampling
EAddario Sep 21, 2025
bdefdb6
Refactor copy_or_broadcast()
EAddario Sep 21, 2025
6b8cedf
Refactor estimate_lambda()
EAddario Sep 21, 2025
c466c53
Refactor pareto pruning and convexification
EAddario Sep 21, 2025
b433fd9
Refactor last budget pass
EAddario Sep 21, 2025
b6c008f
Refactor helper lambdas
EAddario Sep 21, 2025
7386d4e
Refactor row sampling
EAddario Sep 21, 2025
08146fd
Refactor side_data() and copy_or_broadcast()
EAddario Sep 21, 2025
17be761
Refactor candidate types build
EAddario Sep 21, 2025
b09662f
Refactor estimate_lambda()
EAddario Sep 21, 2025
a7ee915
Refactor trimmed_sum()
EAddario Sep 21, 2025
1a3e9ea
Refactor estimate_error()
EAddario Sep 21, 2025
9a1656e
Refactor pareto optimise and convexify
EAddario Sep 21, 2025
0d5f183
Refactor lagrange_penalty()
EAddario Sep 21, 2025
814f6b6
Minor general refactoring
EAddario Sep 21, 2025
e92db00
Refactor quantisation checks into its own function
EAddario Sep 21, 2025
fecc472
Fix typos in variable names
EAddario Sep 21, 2025
896cdc2
Refactor potential overflow
EAddario Sep 21, 2025
b748a1e
Fix typo
EAddario Sep 21, 2025
c855094
Exit loop if no better solution found
EAddario Sep 22, 2025
1fbc59f
Replace slope with cross product
EAddario Sep 22, 2025
f184450
Fix minor logic flaw
EAddario Sep 22, 2025
d79ade2
Adjust for small vector size
EAddario Sep 22, 2025
7ba6001
Simplify candidates sorting
EAddario Sep 22, 2025
d36ee0a
Add comments to explain magic numbers
EAddario Sep 22, 2025
8eedcf7
Increase scale multiplier
EAddario Sep 22, 2025
a74b410
Move is_iq() into a lambda and remove unused variables
EAddario Sep 25, 2025
dbdd179
Combine quant types
EAddario Sep 25, 2025
29bb30c
Merge branch 'master' into quantize
EAddario Sep 25, 2025
dd4f4bd
Reduce bpw range
EAddario Sep 27, 2025
d169457
Refactor outlier trimming
EAddario Sep 27, 2025
87cba65
Tighten worker allocator
EAddario Sep 27, 2025
8a2c71f
Check for direction reversal
EAddario Sep 27, 2025
3d75b14
Simplify dequantisation
EAddario Sep 27, 2025
e49e241
Calculate bpw over all tensors
EAddario Sep 27, 2025
b3b8a11
Compute rows based on tensor shape and slice count
EAddario Sep 28, 2025
f5d8811
Prioritise important tensors
EAddario Oct 1, 2025
555981b
Merge branch 'master' into quantize
EAddario Oct 1, 2025
940db63
Select quantization type if target_bpw is set unless user specifies t…
EAddario Oct 3, 2025
fb07fe9
Merge branch 'master' into quantize
EAddario Oct 3, 2025
66d4aed
Minor refactoring
EAddario Oct 4, 2025
560e8c9
Relax lambda clamping
EAddario Oct 5, 2025
533cda3
Add signal handler
EAddario Oct 5, 2025
e48ca32
Add save_bpw_state()
EAddario Oct 5, 2025
02c3073
Add load_bpw_state()
EAddario Oct 5, 2025
74c62ed
Add delete_bpw_state()
EAddario Oct 5, 2025
46706ce
Persist progress
EAddario Oct 5, 2025
84ada44
Uninstall signal handler and cleanup
EAddario Oct 5, 2025
044fa78
Fix trimming logic
EAddario Oct 6, 2025
c11184a
Generate model ID hash
EAddario Oct 9, 2025
3a3d807
Remove bias mode computation
EAddario Oct 10, 2025
c93131c
Remove --no-bias option
EAddario Oct 10, 2025
5b0d3f6
Automatically determine if bias error is significant
EAddario Oct 11, 2025
951de2e
Merge branch 'master' into quantize
EAddario Oct 11, 2025
12e0524
Reduce compute time by parallelising tensor processing - courtesy of …
EAddario Oct 12, 2025
b6094a9
Add quant types
EAddario Oct 12, 2025
ca28230
Add --keep-bpw-state option
EAddario Oct 12, 2025
b1b58e6
Refactor signal handlers
EAddario Oct 13, 2025
cd734b8
Update quant types
EAddario Oct 13, 2025
b7911f1
Minor refactoring
EAddario Oct 13, 2025
a6853ea
Add tensor type and depth heuristics
EAddario Oct 16, 2025
0b3e930
Add option to override bpw state file name
EAddario Oct 16, 2025
a510393
Minor refactoring
EAddario Oct 16, 2025
41a0069
Merge branch 'master' into quantize
EAddario Oct 16, 2025
fa1df81
Finetune heuristics
EAddario Oct 20, 2025
90402c0
Merge branch 'master' into quantize
EAddario Oct 20, 2025
00ddf03
Update usage
EAddario Oct 20, 2025
543b5a9
Fix lambda capture
EAddario Oct 20, 2025
27bf25e
Fix lambda capture
EAddario Oct 20, 2025
04561d5
Update epsilon specifier
EAddario Oct 21, 2025
d6ccd56
Finetune heuristics
EAddario Oct 25, 2025
8da14c0
Merge branch 'master' into quantize
EAddario Oct 25, 2025
5303212
Simplify tensor selection
EAddario Oct 26, 2025
f8863b9
Minor refactoring
EAddario Oct 28, 2025
6e32244
Read statistics from imatrix
EAddario Oct 30, 2025
c59bb6d
Add Euclidean-Cosine score to identify important tensors
EAddario Oct 30, 2025
b02b1b2
Merge branch 'master' into quantize
EAddario Oct 31, 2025
ac8cfbd
Improved is_important() logic
EAddario Nov 17, 2025
bdf2e74
Merge branch 'master' into quantize
EAddario Nov 17, 2025
a0ba913
Fix lambda capture bug in Windows and initialise candidate_types struct
EAddario Nov 19, 2025
9ec3e6e
Remove processing statistics_data
EAddario Nov 23, 2025
1c9993e
Add --disable-tensor-importance option
EAddario Nov 23, 2025
7eb7714
Merge branch 'master' into quantize
EAddario Nov 23, 2025
6616008
Use more descriptive option naming
EAddario Nov 24, 2025
69a32b6
Relax target bpw range
EAddario Nov 29, 2025
5b557ca
Minor refactoring
EAddario Nov 29, 2025
229109f
Increase importance boost for final pass
EAddario Nov 29, 2025
b97cda6
Add B/F16 to get_ftype()
EAddario Nov 29, 2025
37cf51e
Process bpw targets up to B/F16
EAddario Nov 30, 2025
3f7842c
Merge branch 'master' into quantize
EAddario Nov 30, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions include/llama.h
Original file line number Diff line number Diff line change
Expand Up @@ -360,9 +360,12 @@ extern "C" {
bool pure; // quantize all tensors to the default type
bool keep_split; // quantize to the same number of shards
void * imatrix; // pointer to importance matrix data
void * activations; // pointer to activations data
void * kv_overrides; // pointer to vector containing overrides
void * tensor_types; // pointer to vector containing tensor types
void * prune_layers; // pointer to vector containing layer indices to prune
float target_bpw; // target bits per weight (bpw)
bool precise_lambda; // use precise_lambda calculation - slow computation but very accurate
} llama_model_quantize_params;

typedef struct llama_logit_bias {
Expand Down
Loading
Loading