Skip to content

Commit 23bce61

Browse files
committed
Added README
* Also allows no --image argument cli to the qwen2vl-cli
1 parent 5598f47 commit 23bce61

File tree

2 files changed

+72
-1
lines changed

2 files changed

+72
-1
lines changed

examples/llava/README-qwen2vl.md

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
# QWEN2-VL
2+
3+
This implementation supports all versions of Qwen2VL, e.g. [Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct).
4+
5+
## Usage
6+
7+
After building, run `./llama-qwen2vl-cli` to use it. Or you can also get the ready one on Huggingface, e.g. [Qwen2-VL-2B-Instruct-GGUF](https://huggingface.co/bartowski/Qwen2-VL-2B-Instruct-GGUF) :
8+
9+
### The basic one for running with an image and a prompt
10+
11+
```sh
12+
./bin/llama-qwen2vl-cli -m /models/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-2B-Instruct-f32.gguf -p 'Describe this image.' --image '/models/test_image.jpg'
13+
```
14+
15+
The image argument is optional in case you just want to use the model for text. However, the mmproj still has to be there as it will be loaded.
16+
17+
Without defining the system prompt in the prompt, it will default to `You are a helpful assistant.`.
18+
19+
### Or if you want the image to be directly in the prompt as a base64
20+
21+
```sh
22+
./llama-qwen2vl-cli -m /models/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-2B-Instruct-f32.gguf -p '<img src="{base64}">Describe this image.'
23+
```
24+
25+
### Or a complete prompt with the system message
26+
27+
```sh
28+
./llama-qwen2vl-cli -m /models/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-2B-Instruct-f32.gguf -p '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|vision_pad|><|vision_end|>Describe this image.' --image '/models/test_image.jpg'
29+
```
30+
31+
**Note**: A lower temperature like 0.1 is recommended for better quality. Add `--temp 0.1` to the command to do so.
32+
**Note**: For GPU offloading, ensure to use the `-ngl` flag as usual.
33+
34+
## GGUF Conversion
35+
36+
1. Clone the Qwen2-VL model:
37+
38+
```sh
39+
git clone https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct
40+
```
41+
42+
2. Use `qwen2_vl_surgery.py` to prepare the model for conversion:
43+
44+
```sh
45+
python ./examples/llava/qwen2_vl_surgery.py ./model_path --data_type fp32
46+
```
47+
48+
It will generate the vision model, and output the filename in the log.
49+
50+
3. Use `examples/convert_hf_to_gguf.py` to convert the Qwen2-VL model to GGUF:
51+
52+
```sh
53+
python convert_hf_to_gguf.py ./model_path -outtype f32
54+
```
55+
56+
Now the model is ready to use in the `model_path` directory. You can quantize them as you normally would with other GGUF files.
57+
58+
*Have fun with the models ! :)*
59+
60+
## Limitations
61+
62+
* Currently, only support the image to be in the very beginning of the input prompt to the LLM.

examples/llava/qwen2vl-cli.cpp

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -524,7 +524,7 @@ int main(int argc, char ** argv) {
524524

525525
common_init();
526526

527-
if (params.mmproj.empty() || (params.image.empty() && !prompt_contains_image(params.prompt))) {
527+
if (params.mmproj.empty()) {
528528
print_usage(argc, argv);
529529
return 1;
530530
}
@@ -547,6 +547,15 @@ int main(int argc, char ** argv) {
547547
llava_image_embed_free(image_embed);
548548
ctx_llava->model = NULL;
549549
llava_free(ctx_llava);
550+
} else if (params.image.empty()) {
551+
auto ctx_llava = llava_init_context(&params, model);
552+
553+
// process the prompt
554+
process_prompt(ctx_llava, nullptr, &params, params.prompt);
555+
556+
llama_perf_context_print(ctx_llava->ctx_llama);
557+
ctx_llava->model = NULL;
558+
llava_free(ctx_llava);
550559
#ifndef NDEBUG
551560
} else if (params.image[0].empty()) {
552561
auto ctx_llava = llava_init_context(&params, model);

0 commit comments

Comments
 (0)