Ollama: Image processing is not being performed even though the vision language (VL) model exists in the system #5633

sghunterfan · 2025-02-03T19:34:50Z

sghunterfan
Feb 3, 2025

As mentioned in some other issues, when uploading an image along with text prompt, the system should route the request directly to the first available Vision-Language (VL) model instead of going through Retrieval-Augmented Generation (RAG). However, this expected behavior is not being observed.

Instead, it appears that the VL model is neither being called nor loaded into memory. As a result, all available models fail to produce any meaningful output.

Output from qwen2.5-72b (non-vl):

It seems like you've included an image reference [img-0] in your message, but I'm unable to view or interact with images directly. Could you provide more details about what you need help with regarding the image? If it contains text or specific elements that are relevant to your question, please describe them here!

When calling a VL directly (e.g., qwen2.5-vl-72b), no output is produced, even though the model is loaded into memory and requests are being made to ollama."
[GIN] 2025/02/03 - 19:24:04 | 200 | 132.070797ms | ::1 | POST "/v1/chat/completions"

All configs are attached in this discussion: #5631

Is there anything that I am missing?
Is there an option to set a default VL model for all chats?
Is RAG supposed to be called or not in this scenario?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Ollama: Image processing is not being performed even though the vision language (VL) model exists in the system #5633

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Ollama: Image processing is not being performed even though the vision language (VL) model exists in the system #5633

Uh oh!

sghunterfan Feb 3, 2025

Replies: 0 comments

sghunterfan
Feb 3, 2025