This project provides an Open WebUI Pipelines FILTER that enables image support for text-only LLMs.
When a user attaches an image in Open WebUI:
- The filter detects the image in the OpenAI-style request payload.
- The image is sent to a vision-capable LLM (any OpenAI-compatible backend).
- The vision model returns a rich textual description.
- The image is removed from the request.
- The description is injected as plain text into the user message.
- Open WebUI continues by calling the text-only LLM backend.
This prevents crashes or errors (for example mmproj) on text-only backends while still allowing image-based interaction.
- Open WebUI Pipelines: https://docs.openwebui.com/features/pipelines/
You need:
- Open WebUI running (any supported deployment method).
- Docker on the host system.
- A text model backend that is OpenAI API compatible.
- Examples: llama.cpp OpenAI server, vLLM OpenAI server, TGI, etc.
- The backend must be text-only.
- A vision model backend that is OpenAI API compatible.
- Must support
image_urlin OpenAI-stylemessages. - Example: Qwen3-VL via vLLM.
- Must support
User
↓
Open WebUI
↓
Pipelines FILTER (image → caption)
↓
Text-only LLM (OpenAI-compatible)
The vision model is never selected directly by the user.
mkdir -p /home/<user>/pipelinessudo docker rm -f pipelines 2>/dev/null || true
sudo docker run -d --name pipelines --restart always -p 9099:9099 --add-host=host.docker.internal:host-gateway -e PIPELINES_DIR="/pipelines" -v /home/<user>/pipelines:/pipelines ghcr.io/open-webui/pipelines:mainCheck that the server is running:
sudo docker logs -f pipelinesYou should see output similar to:
Uvicorn running on http://0.0.0.0:9099
Open WebUI treats the Pipelines server as an OpenAI-compatible provider. In Open WebUI:
- Go to Admin Panel → Settings → Connections
- Add a new OpenAI-compatible connection
| Field | Value |
|------|-------|
| Base URL |
http://host.docker.internal:9099| | API Key |0p3n-w3bu!| - Save the connection
host.docker.internalworks because the container was started with:--add-host=host.docker.internal:host-gateway.
Copy the pipeline script from this repository into your pipelines directory:
cp 10_qwen3vl_caption_filter.py /home/<user>/pipelines/10_qwen3vl_caption_filter.pyRestart the Pipelines server so the script is loaded:
sudo docker restart pipelines
sudo docker logs -f pipelinesVerify that the pipeline appears as:
10_qwen3vl_caption_filter (filter)
If it appears as (pipe) instead of (filter), the script is incorrect.
In Open WebUI:
- Go to Admin Panel → Settings → Pipelines
- Find
10_qwen3vl_caption_filter (filter) - Open Valves
You can apply the filter to one or multiple Open WebUI model IDs. Apply to all models:
["*"]Apply to specific models:
["TextModelA", "TextModelB"]Use the exact model IDs as shown in Open WebUI.
Example values:
qwen_base_url:
http://192.xxx.xxx.xxx:8282
qwen_model:
vLLMQwen3VL30B
If your vision backend requires authentication, set qwen_api_key.
For all models this filter applies to:
- Disable Vision / Image input in the model configuration.
- The model must be text-only. The filter removes images before the request reaches the backend.
The pipeline includes a customizable example prompt:
You convert the given image into a rich text in english language.
Return ONLY the final prompt as a single line, no quotes, no extra text.
Include: subject, environment, style, lighting, camera/lens, composition,
key details, ethnicity of people, position and angle of the object in the picture,
detailed clothes description, face description of people, look direction of people,
posture of people, age of people.
All these key description instructions need to be applied on each recognized object,
person, scenery etc. be very detailed and structured in the description.
Avoid meta-commentary.
You can freely adapt this prompt for OCR, product descriptions, art analysis, or scene understanding.
- Open a chat using a normal text model.
- Attach an image.
- Send a message. Expected behavior:
- The filter intercepts the request.
- The image is sent to the vision model.
- The image is removed from the request.
- The generated description is injected as text.
- The text-only backend receives only text input.
- Ensure it is shown as
(filter)in the Pipelines list. - Ensure
type = "filter"exists in the script. - Ensure the model ID is listed in the
pipelinesvalve.
Check Pipelines logs:
sudo docker logs -f pipelinesCommon issues include:
- Wrong vision backend URL
- Wrong model name
- Missing or invalid API key
The filter processes only:
data:image/...URLs- URLs ending in common image extensions Non-image files (PDF, TXT, WAV, etc.) are ignored by design.