Skip to content

Commit 4ed8e4f

Browse files
authored
llava : add explicit instructions for llava-1.6 (ggml-org#5611)
This commit contains a suggestion for the README.md in the llava example. The suggestion adds explicit instructions for how to convert a llava-1.6 model and run it using llava-cli. The motivation for this is that having explicit instructions similar to the 1.5 instructions will make it easier for users to try this out. Signed-off-by: Daniel Bevenius <[email protected]>
1 parent 9c405c9 commit 4ed8e4f

File tree

1 file changed

+32
-6
lines changed

1 file changed

+32
-6
lines changed

examples/llava/README.md

Lines changed: 32 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -59,14 +59,40 @@ python ./convert.py ../llava-v1.5-7b --skip-unknown
5959
Now both the LLaMA part and the image encoder is in the `llava-v1.5-7b` directory.
6060

6161
## LLaVA 1.6 gguf conversion
62-
63-
1) Backup your pth/safetensor model files as llava-surgery modifies them
64-
2) Use `python llava-surgery-v2.py -C -m /path/to/hf-model` which also supports llava-1.5 variants pytorch as well as safetensor models:
62+
1) First clone a LLaVA 1.6 model:
63+
```console
64+
git clone https://huggingface.co/liuhaotian/llava-v1.6-vicuna-7b
65+
```
66+
2) Backup your pth/safetensor model files as llava-surgery modifies them
67+
3) Use `llava-surgery-v2.py` which also supports llava-1.5 variants pytorch as well as safetensor models:
68+
```console
69+
python examples/llava/llava-surgery-v2.py -C -m ../llava-v1.6-vicuna-7b/
70+
```
6571
- you will find a llava.projector and a llava.clip file in your model directory
66-
3) Copy the llava.clip file into a subdirectory (like vit), rename it to pytorch_model.bin and add a fitting vit configuration to the directory (https://huggingface.co/cmp-nct/llava-1.6-gguf/blob/main/config_vit.json) and rename it to config.json.
67-
4) Create the visual gguf model: `python ./examples/llava/convert-image-encoder-to-gguf.py -m ../path/to/vit --llava-projector ../path/to/llava.projector --output-dir ../path/to/output --clip-model-is-vision`
72+
4) Copy the llava.clip file into a subdirectory (like vit), rename it to pytorch_model.bin and add a fitting vit configuration to the directory:
73+
```console
74+
mkdir vit
75+
cp ../llava-v1.6-vicuna-7b/llava.clip vit/pytorch_model.bin
76+
cp ../llava-v1.6-vicuna-7b/llava.projector vit/
77+
curl -s -q https://huggingface.co/cmp-nct/llava-1.6-gguf/raw/main/config_vit.json -o vit/config.json
78+
```
79+
80+
5) Create the visual gguf model:
81+
```console
82+
python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projector vit/llava.projector --output-dir vit --clip-model-is-vision
83+
```
6884
- This is similar to llava-1.5, the difference is that we tell the encoder that we are working with the pure vision model part of CLIP
69-
5) Everything else as usual: convert.py the hf model, quantize as needed
85+
86+
6) Then convert the model to gguf format:
87+
```console
88+
python ./convert.py ../llava-v1.6-vicuna-7b/
89+
```
90+
91+
7) And finally we can run the llava-cli using the 1.6 model version:
92+
```console
93+
./llava-cli -m ../llava-v1.6-vicuna-7b/ggml-model-f16.gguf --mmproj vit/mmproj-model-f16.gguf --image some-image.jpg -c 4096
94+
```
95+
7096
**note** llava-1.6 needs more context than llava-1.5, at least 3000 is needed (just run it at -c 4096)
7197
**note** llava-1.6 greatly benefits from batched prompt processing (defaults work)
7298

0 commit comments

Comments
 (0)