@@ -105,11 +105,14 @@ improved significantly thanks to many contributions. It is the main playground f
105
105
- [X] [ MPT] ( https://github.com/ggerganov/llama.cpp/pull/3417 )
106
106
- [X] [ Bloom] ( https://github.com/ggerganov/llama.cpp/pull/3553 )
107
107
- [x] [ Yi models] ( https://huggingface.co/models?search=01-ai/Yi )
108
- - [X] [ StableLM-3b-4e1t ] ( https://github.com/ggerganov/llama.cpp/pull/3586 )
108
+ - [X] [ StableLM models ] ( https://huggingface.co/stabilityai )
109
109
- [x] [ Deepseek models] ( https://huggingface.co/models?search=deepseek-ai/deepseek )
110
110
- [x] [ Qwen models] ( https://huggingface.co/models?search=Qwen/Qwen )
111
111
- [x] [ PLaMo-13B] ( https://github.com/ggerganov/llama.cpp/pull/3557 )
112
+ - [x] [ Phi models] ( https://huggingface.co/models?search=microsoft/phi )
112
113
- [x] [ GPT-2] ( https://huggingface.co/gpt2 )
114
+ - [x] [ Orion 14B] ( https://github.com/ggerganov/llama.cpp/pull/5118 )
115
+ - [x] [ InternLM2] ( https://huggingface.co/models?search=internlm2 )
113
116
- [x] [ CodeShell] ( https://github.com/WisdomShell/codeshell )
114
117
115
118
** Multimodal models:**
@@ -119,6 +122,7 @@ improved significantly thanks to many contributions. It is the main playground f
119
122
- [x] [ Obsidian] ( https://huggingface.co/NousResearch/Obsidian-3B-V0.5 )
120
123
- [x] [ ShareGPT4V] ( https://huggingface.co/models?search=Lin-Chen/ShareGPT4V )
121
124
- [x] [ MobileVLM 1.7B/3B models] ( https://huggingface.co/models?search=mobileVLM )
125
+ - [x] [ Yi-VL] ( https://huggingface.co/models?search=Yi-VL )
122
126
123
127
124
128
** Bindings:**
@@ -732,9 +736,21 @@ Several quantization methods are supported. They differ in the resulting model d
732
736
| 13B | bits/weight | 16.0 | 4.5 | 5.0 | 5.5 | 6.0 | 8.5 |
733
737
734
738
- [k-quants](https://github.com/ggerganov/llama.cpp/pull/1684)
735
- - recent k-quants improvements
739
+ - recent k-quants improvements and new i-quants
736
740
- [#2707](https://github.com/ggerganov/llama.cpp/pull/2707)
737
741
- [#2807](https://github.com/ggerganov/llama.cpp/pull/2807)
742
+ - [#4773 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4773)
743
+ - [#4856 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4856)
744
+ - [#4861 - importance matrix](https://github.com/ggerganov/llama.cpp/pull/4861)
745
+ - [#4872 - MoE models](https://github.com/ggerganov/llama.cpp/pull/4872)
746
+ - [#4897 - 2-bit quantization](https://github.com/ggerganov/llama.cpp/pull/4897)
747
+ - [#4930 - imatrix for all k-quants](https://github.com/ggerganov/llama.cpp/pull/4930)
748
+ - [#4951 - imatrix on the GPU](https://github.com/ggerganov/llama.cpp/pull/4957)
749
+ - [#4969 - imatrix for legacy quants](https://github.com/ggerganov/llama.cpp/pull/4969)
750
+ - [#4996 - k-qunats tuning](https://github.com/ggerganov/llama.cpp/pull/4996)
751
+ - [#5060 - Q3_K_XS](https://github.com/ggerganov/llama.cpp/pull/5060)
752
+ - [#5196 - 3-bit i-quants](https://github.com/ggerganov/llama.cpp/pull/5196)
753
+ - [quantization tuning](https://github.com/ggerganov/llama.cpp/pull/5320), [another one](https://github.com/ggerganov/llama.cpp/pull/5334), and [another one](https://github.com/ggerganov/llama.cpp/pull/5361)
738
754
739
755
### Perplexity (measuring model quality)
740
756
0 commit comments