Skip to content

Commit 425ae74

Browse files
Merge branch 'ggerganov:master' into master
2 parents 0c02642 + 213d143 commit 425ae74

File tree

6 files changed

+393
-404
lines changed

6 files changed

+393
-404
lines changed

README.md

Lines changed: 18 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -105,11 +105,14 @@ improved significantly thanks to many contributions. It is the main playground f
105105
- [X] [MPT](https://github.com/ggerganov/llama.cpp/pull/3417)
106106
- [X] [Bloom](https://github.com/ggerganov/llama.cpp/pull/3553)
107107
- [x] [Yi models](https://huggingface.co/models?search=01-ai/Yi)
108-
- [X] [StableLM-3b-4e1t](https://github.com/ggerganov/llama.cpp/pull/3586)
108+
- [X] [StableLM models](https://huggingface.co/stabilityai)
109109
- [x] [Deepseek models](https://huggingface.co/models?search=deepseek-ai/deepseek)
110110
- [x] [Qwen models](https://huggingface.co/models?search=Qwen/Qwen)
111111
- [x] [PLaMo-13B](https://github.com/ggerganov/llama.cpp/pull/3557)
112+
- [x] [Phi models](https://huggingface.co/models?search=microsoft/phi)
112113
- [x] [GPT-2](https://huggingface.co/gpt2)
114+
- [x] [Orion 14B](https://github.com/ggerganov/llama.cpp/pull/5118)
115+
- [x] [InternLM2](https://huggingface.co/models?search=internlm2)
113116
- [x] [CodeShell](https://github.com/WisdomShell/codeshell)
114117

115118
**Multimodal models:**
@@ -119,6 +122,7 @@ improved significantly thanks to many contributions. It is the main playground f
119122
- [x] [Obsidian](https://huggingface.co/NousResearch/Obsidian-3B-V0.5)
120123
- [x] [ShareGPT4V](https://huggingface.co/models?search=Lin-Chen/ShareGPT4V)
121124
- [x] [MobileVLM 1.7B/3B models](https://huggingface.co/models?search=mobileVLM)
125+
- [x] [Yi-VL](https://huggingface.co/models?search=Yi-VL)
122126

123127

124128
**Bindings:**
@@ -732,9 +736,21 @@ Several quantization methods are supported. They differ in the resulting model d
732736
| 13B | bits/weight | 16.0 | 4.5 | 5.0 | 5.5 | 6.0 | 8.5 |
733737
734738
- [k-quants](https://github.com/ggerganov/llama.cpp/pull/1684)
735-
- recent k-quants improvements
739+
- recent k-quants improvements and new i-quants
736740
- [#2707](https://github.com/ggerganov/llama.cpp/pull/2707)
737741
- [#2807](https://github.com/ggerganov/llama.cpp/pull/2807)
742+
- [#4773 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4773)
743+
- [#4856 - 2-bit i-quants (inference)](https://github.com/ggerganov/llama.cpp/pull/4856)
744+
- [#4861 - importance matrix](https://github.com/ggerganov/llama.cpp/pull/4861)
745+
- [#4872 - MoE models](https://github.com/ggerganov/llama.cpp/pull/4872)
746+
- [#4897 - 2-bit quantization](https://github.com/ggerganov/llama.cpp/pull/4897)
747+
- [#4930 - imatrix for all k-quants](https://github.com/ggerganov/llama.cpp/pull/4930)
748+
- [#4951 - imatrix on the GPU](https://github.com/ggerganov/llama.cpp/pull/4957)
749+
- [#4969 - imatrix for legacy quants](https://github.com/ggerganov/llama.cpp/pull/4969)
750+
- [#4996 - k-qunats tuning](https://github.com/ggerganov/llama.cpp/pull/4996)
751+
- [#5060 - Q3_K_XS](https://github.com/ggerganov/llama.cpp/pull/5060)
752+
- [#5196 - 3-bit i-quants](https://github.com/ggerganov/llama.cpp/pull/5196)
753+
- [quantization tuning](https://github.com/ggerganov/llama.cpp/pull/5320), [another one](https://github.com/ggerganov/llama.cpp/pull/5334), and [another one](https://github.com/ggerganov/llama.cpp/pull/5361)
738754
739755
### Perplexity (measuring model quality)
740756

0 commit comments

Comments
 (0)