|
| 1 | +# QWEN2-VL |
| 2 | + |
| 3 | +This implementation supports all versions of Qwen2VL, e.g. [Qwen2-VL-2B-Instruct](https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct). |
| 4 | + |
| 5 | +## Usage |
| 6 | + |
| 7 | +After building, run `./llama-qwen2vl-cli` to use it. Or you can also get the ready one on Huggingface, e.g. [Qwen2-VL-2B-Instruct-GGUF](https://huggingface.co/bartowski/Qwen2-VL-2B-Instruct-GGUF) : |
| 8 | + |
| 9 | +### The basic one for running with an image and a prompt |
| 10 | + |
| 11 | +```sh |
| 12 | +./bin/llama-qwen2vl-cli -m /models/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-2B-Instruct-f32.gguf -p 'Describe this image.' --image '/models/test_image.jpg' |
| 13 | +``` |
| 14 | + |
| 15 | +The image argument is optional in case you just want to use the model for text. However, the mmproj still has to be there as it will be loaded. |
| 16 | + |
| 17 | +Without defining the system prompt in the prompt, it will default to `You are a helpful assistant.`. |
| 18 | + |
| 19 | +### Or if you want the image to be directly in the prompt as a base64 |
| 20 | + |
| 21 | +```sh |
| 22 | +./llama-qwen2vl-cli -m /models/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-2B-Instruct-f32.gguf -p '<img src="{base64}">Describe this image.' |
| 23 | +``` |
| 24 | + |
| 25 | +### Or a complete prompt with the system message |
| 26 | + |
| 27 | +```sh |
| 28 | +./llama-qwen2vl-cli -m /models/Qwen2-VL-2B-Instruct-Q4_0.gguf --mmproj /models/mmproj-Qwen2-VL-2B-Instruct-f32.gguf -p '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n<|im_start|>user\n<|vision_start|><|vision_pad|><|vision_end|>Describe this image.' --image '/models/test_image.jpg' |
| 29 | +``` |
| 30 | + |
| 31 | +**Note**: A lower temperature like 0.1 is recommended for better quality. Add `--temp 0.1` to the command to do so. |
| 32 | +**Note**: For GPU offloading, ensure to use the `-ngl` flag as usual. |
| 33 | + |
| 34 | +## GGUF Conversion |
| 35 | + |
| 36 | +1. Clone the Qwen2-VL model: |
| 37 | + |
| 38 | +```sh |
| 39 | +git clone https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct |
| 40 | +``` |
| 41 | + |
| 42 | +2. Use `qwen2_vl_surgery.py` to prepare the model for conversion: |
| 43 | + |
| 44 | +```sh |
| 45 | +python ./examples/llava/qwen2_vl_surgery.py ./model_path --data_type fp32 |
| 46 | +``` |
| 47 | + |
| 48 | +It will generate the vision model, and output the filename in the log. |
| 49 | + |
| 50 | +3. Use `examples/convert_hf_to_gguf.py` to convert the Qwen2-VL model to GGUF: |
| 51 | + |
| 52 | +```sh |
| 53 | +python convert_hf_to_gguf.py ./model_path -outtype f32 |
| 54 | +``` |
| 55 | + |
| 56 | +Now the model is ready to use in the `model_path` directory. You can quantize them as you normally would with other GGUF files. |
| 57 | + |
| 58 | +*Have fun with the models ! :)* |
| 59 | + |
| 60 | +## Limitations |
| 61 | + |
| 62 | +* Currently, only support the image to be in the very beginning of the input prompt to the LLM. |
0 commit comments