-
Notifications
You must be signed in to change notification settings - Fork 158
Add vision support in llama-server #901
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| { | ||
| const auto& chunk = slot.prompt_tokens.find_chunk(slot.n_past); | ||
| slot.cache_tokens.push_back(chunk.get()); // copy | ||
| fprintf(stdout, slot.cache_tokens.detokenize(ctx, true).c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed this one in the review. What is the intent here? fprintf needs to have a format string for this to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug purpose. Should be removed.
| } | ||
| }; | ||
| const auto& prompt = data.at("prompt"); | ||
| fprintf(stdout, prompt.get<std::string>().c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent here? fprintf needs to have a format string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Debug purpose. Should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will fix this in my next PR to address #904
|
I get an error when sending images (text works fine) add_text: <|vision_start|> |
|
Thanks! time to download qwen 235b finally. |
What is the resolution of the image you use and does it happen with all the images? I tried test-1.jpeg inside examples\mtmd folder and it's working. I used Qwen3-VL-235B-A22B-Instruct-UD-Q2_K_XL.gguf from unsloth and their mmproj-F16.gguf for mmproj. |
|
I'm using iq2_M from https://huggingface.co/mradermacher/Qwen3-VL-235B-A22B-Instruct-i1-GGUF with his f16 mmproj from here https://huggingface.co/mradermacher/Qwen3-VL-235B-A22B-Instruct-GGUF. Using latest sillytavern, chat completion. I tried the same image you mentioned test1.jpeg but it still doesn't work. |
|
Which build or commit are you using? |
|
Current main 320fc60 |
|
Does adding --jinja help? Can you also try the built-in webui? |
|
--jinja does not help and the built in web ui gives me the same error. I also downloaded a different gguf (from unsloth) and it gives me the same error. |
This reverts commit 15159a8.
This reverts commit 15159a8.
|
Sure |
This reverts commit 15159a8.
This reverts commit 15159a8.
This reverts commit 15159a8.


This PR adds vision support for llama-server. Both llama-server and mtmd are now up to PR #16275 (9/26/2025) in mainline. ggml-org/llama.cpp#12898
Updated webui to support adding pictures and files and other minor ui changes. Test with both current webui and new llama.cpp webui (launched via --webui llamacpp ) using Qwen2.5-VL-7B-Instruct-Q8_0.gguf and mmproj-F16.gguf and both works fine.
Note that when using --mmproj for vision model, context shift is disabled.
Other changes: