Ollama: Image processing is not being performed even though the vision language (VL) model exists in the system #5633
Unanswered
sghunterfan
asked this question in
Troubleshooting
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
As mentioned in some other issues, when uploading an image along with text prompt, the system should route the request directly to the first available Vision-Language (VL) model instead of going through Retrieval-Augmented Generation (RAG). However, this expected behavior is not being observed.
Instead, it appears that the VL model is neither being called nor loaded into memory. As a result, all available models fail to produce any meaningful output.
Output from qwen2.5-72b (non-vl):
When calling a VL directly (e.g., qwen2.5-vl-72b), no output is produced, even though the model is loaded into memory and requests are being made to ollama."
[GIN] 2025/02/03 - 19:24:04 | 200 | 132.070797ms | ::1 | POST "/v1/chat/completions"
All configs are attached in this discussion: #5631
Is there anything that I am missing?
Is there an option to set a default VL model for all chats?
Is RAG supposed to be called or not in this scenario?
Beta Was this translation helpful? Give feedback.
All reactions