File tree Expand file tree Collapse file tree 1 file changed +13
-1
lines changed Expand file tree Collapse file tree 1 file changed +13
-1
lines changed Original file line number Diff line number Diff line change @@ -736,9 +736,21 @@ Several quantization methods are supported. They differ in the resulting model d
736
736
| 13B | bits/weight | 16.0 | 4.5 | 5.0 | 5.5 | 6.0 | 8.5 |
737
737
738
738
- [k-quants](https://github.com/ggerganov/llama.cpp/pull/1684)
739
- - recent k-quants improvements
739
+ - recent k-quants improvements and new i-quants
740
740
- [#2707](https://github.com/ggerganov/llama.cpp/pull/2707)
741
741
- [#2807](https://github.com/ggerganov/llama.cpp/pull/2807)
742
+ - [#4773 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4773)
743
+ - [#4856 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4856)
744
+ - [#4861 - importance matrix](https://github.com/ggerganov/llama.cpp/pull/4861)
745
+ - [#4872 - MoE models](https://github.com/ggerganov/llama.cpp/pull/4872)
746
+ - [#4897 - 2-bit quantization](https://github.com/ggerganov/llama.cpp/pull/4897)
747
+ - [#4930 - imatrix for all k-quants](https://github.com/ggerganov/llama.cpp/pull/4930)
748
+ - [#4951 - imatrix on the GPU](https://github.com/ggerganov/llama.cpp/pull/4957)
749
+ - [#4969 - imatrix for legacy quants](https://github.com/ggerganov/llama.cpp/pull/4969)
750
+ - [#4996 - k-qunats tuning](https://github.com/ggerganov/llama.cpp/pull/4996)
751
+ - [#5060 - Q3_K_XS](https://github.com/ggerganov/llama.cpp/pull/5060)
752
+ - [#5196 - 3-bit i-quants](https://github.com/ggerganov/llama.cpp/pull/5196)
753
+ - [quantization tuning](https://github.com/ggerganov/llama.cpp/pull/5320), [another one](https://github.com/ggerganov/llama.cpp/pull/5334), and [another one](https://github.com/ggerganov/llama.cpp/pull/5361)
742
754
743
755
### Perplexity (measuring model quality)
744
756
You can’t perform that action at this time.
0 commit comments