Skip to content

Commit e39e2b2

Browse files
committed
updated readme
1 parent 37b572e commit e39e2b2

File tree

2 files changed

+28
-1
lines changed

2 files changed

+28
-1
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
build/**
2+
build_*/**
23
.build/**
34

45
models/**

README.md

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,29 @@
11
# llama.cpp
22

3-
This repo is cloned from llama.cpp [commit 74d73dc85cc2057446bf63cc37ff649ae7cebd80](https://github.com/ggerganov/llama.cpp/tree/74d73dc85cc2057446bf63cc37ff649ae7cebd80). It is compatible with llama-cpp-python [commit 7ecdd944624cbd49e4af0a5ce1aa402607d58dcc](https://github.com/abetlen/llama-cpp-python/commit/7ecdd944624cbd49e4af0a5ce1aa402607d58dcc)
3+
This repo is cloned from llama.cpp [commit 74d73dc85cc2057446bf63cc37ff649ae7cebd80](https://github.com/ggerganov/llama.cpp/tree/74d73dc85cc2057446bf63cc37ff649ae7cebd80). It is compatible with llama-cpp-python [commit 7ecdd944624cbd49e4af0a5ce1aa402607d58dcc](https://github.com/abetlen/llama-cpp-python/commit/7ecdd944624cbd49e4af0a5ce1aa402607d58dcc)
4+
5+
## Customize quantization group size at compilation (CPU inference only)
6+
7+
The only thing that is different is to add -DQK4_0 flag when cmake.
8+
9+
```bash
10+
cmake -B build_cpu_g128 -DQK4_0=128
11+
cmake --build build_cpu_g128
12+
```
13+
14+
To quantize the model with the customized group size, run
15+
16+
```bash
17+
./build_cpu_g128/bin/llama-quantize <model_path.gguf> <quantization_type>
18+
```
19+
20+
To run the quantized model, run
21+
22+
```bash
23+
./build_cpu_g128/bin/llama-cli -m <quantized_model_path.gguf>
24+
```
25+
26+
### Note:
27+
28+
You should make sure that the model you run is quantized to the same group size as the one you compile with.
29+
Or you'll receive a runtime error when loading the model.

0 commit comments

Comments
 (0)