Skip to content

Commit b5b834e

Browse files
committed
Update supported quantization types
1 parent 01c8c3b commit b5b834e

File tree

1 file changed

+24
-19
lines changed

1 file changed

+24
-19
lines changed

README.md

Lines changed: 24 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -55,29 +55,34 @@ CACHE_DIRECTORY=.\cache
5555
#
5656
# Possible llama.cpp quantization types:
5757
#
58-
# Q2_K : 2.63G, +0.6717 ppl @ LLaMA-v1-7B
59-
# Q3_K_S : 2.75G, +0.5551 ppl @ LLaMA-v1-7B
60-
# Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
61-
# Q3_K_L : 3.35G, +0.1764 ppl @ LLaMA-v1-7B
62-
# Q4_0 : 3.56G, +0.2166 ppl @ LLaMA-v1-7B
63-
# Q4_1 : 3.90G, +0.1585 ppl @ LLaMA-v1-7B
64-
# Q4_K_S : 3.59G, +0.0992 ppl @ LLaMA-v1-7B
65-
# Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
66-
# Q5_0 : 4.33G, +0.0683 ppl @ LLaMA-v1-7B
67-
# Q5_1 : 4.70G, +0.0349 ppl @ LLaMA-v1-7B
68-
# Q5_K_S : 4.33G, +0.0400 ppl @ LLaMA-v1-7B
69-
# Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
70-
# Q6_K : 5.15G, -0.0008 ppl @ LLaMA-v1-7B
71-
# Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
72-
# F16 : 13.00G @ 7B
73-
# F32 : 26.00G @ 7B
74-
# COPY : only copy tensors, no quantizing
58+
# IQ2_XXS : 2.06 bpw quantization
59+
# IQ2_XS : 2.31 bpw quantization
60+
# Q2_K : 2.63G, +0.6717 ppl @ LLaMA-v1-7B
61+
# Q2_K_S : 2.16G, +9.0634 ppl @ LLaMA-v1-7B
62+
# Q3_K_XS : 3-bit extra small quantization
63+
# Q3_K_S : 2.75G, +0.5551 ppl @ LLaMA-v1-7B
64+
# Q3_K_M : 3.07G, +0.2496 ppl @ LLaMA-v1-7B
65+
# Q3_K_L : 3.35G, +0.1764 ppl @ LLaMA-v1-7B
66+
# Q4_0 : 3.56G, +0.2166 ppl @ LLaMA-v1-7B
67+
# Q4_1 : 3.90G, +0.1585 ppl @ LLaMA-v1-7B
68+
# Q4_K_S : 3.59G, +0.0992 ppl @ LLaMA-v1-7B
69+
# Q4_K_M : 3.80G, +0.0532 ppl @ LLaMA-v1-7B
70+
# Q5_0 : 4.33G, +0.0683 ppl @ LLaMA-v1-7B
71+
# Q5_1 : 4.70G, +0.0349 ppl @ LLaMA-v1-7B
72+
# Q5_K_S : 4.33G, +0.0400 ppl @ LLaMA-v1-7B
73+
# Q5_K_M : 4.45G, +0.0122 ppl @ LLaMA-v1-7B
74+
# Q6_K : 5.15G, -0.0008 ppl @ LLaMA-v1-7B
75+
# Q8_0 : 6.70G, +0.0004 ppl @ LLaMA-v1-7B
76+
# F16 : 13.00G @ 7B
77+
# F32 : 26.00G @ 7B
78+
# COPY : only copy tensors, no quantizing
7579
#
76-
# Hint: The sweet spot is Q4_K_M.
80+
# Hint: The sweet spot is Q5_K_M.
7781
#
78-
QUANTIZATION_TYPES=q4_K_M,q2_K
82+
QUANTIZATION_TYPES=Q5_K_M,Q3_K_XS
7983
```
8084

85+
8186
## Usage
8287

8388
### 1. Clone a model

0 commit comments

Comments
 (0)