Skip to content

Commit 1f64092

Browse files
committed
Update quantization types
1 parent 1fd029e commit 1f64092

File tree

1 file changed

+29
-22
lines changed

1 file changed

+29
-22
lines changed

README.md

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -58,28 +58,35 @@ CACHE_DIRECTORY=.\cache
5858
#
5959
# Possible llama.cpp quantization types:
6060
#
61-
# IQ2_XXS : 2.06 bpw quantization
62-
# IQ2_XS : 2.31 bpw quantization
63-
# Q2_K : 2.63G, +0.6717 ppl @ LLaMA-v1-7B
64-
# Q2_K_S : 2.16G, +9.0634 ppl @ LLaMA-v1-7B
65-
# IQ3_XXS : 3.06 bpw quantization
66-
# Q3_K_XS : 3-bit extra small quantization
67-
# Q3_K_S : 2.75G, +0.5551 ppl @ LLaMA-v1-7B
68-
# Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
69-
# Q3_K_L : 3.35G, +0.1764 ppl @ LLaMA-v1-7B
70-
# Q4_0 : 3.56G, +0.2166 ppl @ LLaMA-v1-7B
71-
# Q4_1 : 3.90G, +0.1585 ppl @ LLaMA-v1-7B
72-
# Q4_K_S : 3.59G, +0.0992 ppl @ LLaMA-v1-7B
73-
# Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
74-
# Q5_0 : 4.33G, +0.0683 ppl @ LLaMA-v1-7B
75-
# Q5_1 : 4.70G, +0.0349 ppl @ LLaMA-v1-7B
76-
# Q5_K_S : 4.33G, +0.0400 ppl @ LLaMA-v1-7B
77-
# Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
78-
# Q6_K : 5.15G, -0.0008 ppl @ LLaMA-v1-7B
79-
# Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
80-
# F16 : 13.00G @ 7B
81-
# F32 : 26.00G @ 7B
82-
# COPY : only copy tensors, no quantizing
61+
# 2 or Q4_0 : 3.56G, +0.2166 ppl @ LLaMA-v1-7B
62+
# 3 or Q4_1 : 3.90G, +0.1585 ppl @ LLaMA-v1-7B
63+
# 8 or Q5_0 : 4.33G, +0.0683 ppl @ LLaMA-v1-7B
64+
# 9 or Q5_1 : 4.70G, +0.0349 ppl @ LLaMA-v1-7B
65+
# 19 or IQ2_XXS : 2.06 bpw quantization
66+
# 20 or IQ2_XS : 2.31 bpw quantization
67+
# 28 or IQ2_S : 2.5 bpw quantization
68+
# 29 or IQ2_M : 2.7 bpw quantization
69+
# 24 or IQ1_S : 1.56 bpw quantization
70+
# 10 or Q2_K : 2.63G, +0.6717 ppl @ LLaMA-v1-7B
71+
# 21 or Q2_K_S : 2.16G, +9.0634 ppl @ LLaMA-v1-7B
72+
# 23 or IQ3_XXS : 3.06 bpw quantization
73+
# 26 or IQ3_S : 3.44 bpw quantization
74+
# 27 or IQ3_M : 3.66 bpw quantization mix
75+
# 22 or IQ3_XS : 3.3 bpw quantization
76+
# 11 or Q3_K_S : 2.75G, +0.5551 ppl @ LLaMA-v1-7B
77+
# 12 or Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
78+
# 13 or Q3_K_L : 3.35G, +0.1764 ppl @ LLaMA-v1-7B
79+
# 25 or IQ4_NL : 4.50 bpw non-linear quantization
80+
# 30 or IQ4_XS : 4.25 bpw non-linear quantization
81+
# 14 or Q4_K_S : 3.59G, +0.0992 ppl @ LLaMA-v1-7B
82+
# 15 or Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
83+
# 16 or Q5_K_S : 4.33G, +0.0400 ppl @ LLaMA-v1-7B
84+
# 17 or Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
85+
# 18 or Q6_K : 5.15G, +0.0008 ppl @ LLaMA-v1-7B
86+
# 7 or Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
87+
# 1 or F16 : 13.00G @ 7B
88+
# 0 or F32 : 26.00G @ 7B
89+
# COPY : only copy tensors, no quantizing
8390
#
8491
# Hint: The sweet spot is Q5_K_M. The smallest quantization
8592
# without the need for an importance matrix is IQ3_XXS.

0 commit comments

Comments
 (0)