-
Notifications
You must be signed in to change notification settings - Fork 192
[Feature]: Support open-weight VLM backends (vLLM, Ollama, etc.) #114
Description
Problem or motivation
Problem
The pipeline supports five VLM providers (Gemini, OpenAI, OpenRouter, Bedrock, Anthropic) and four image generation providers, but all of them require a hosted API with API keys and usage costs. There's no documented or tested path for running open-weight models like Qwen-VL, LLaVA, or Llama locally.
The OpenAI provider accepts a custom base_url via OPENAI_BASE_URL, so in theory you could point it at a local vLLM or Ollama server. But this hasn't been tested with open-weight models, and there are known compatibility gaps:
- JSON mode: The Retriever and Critic agents pass
response_format="json", which the OpenAI provider sends as{"type": "json_object"}. Many open-weight models don't support this. - Vision: The Critic sends the generated image to the VLM for evaluation, and the Planner sends reference example images for in-context learning. Both agents require a vision-capable model, so text-only models won't work.
Proposed solution
-
Test the existing OpenAI provider with vLLM and Ollama - Before writing new providers, figure out what already works. Point
OPENAI_BASE_URLat a local vLLM or Ollama server running a vision-capable model (e.g. Qwen2.5-VL, LLaVA) and run the pipeline end-to-end. Document what breaks. -
Handle JSON mode gracefully - Many open-weight models don't support
response_format: json_object. Two options: let providers declare whether they support JSON mode so the flag is skipped when they don't, or lean on the fallback parsing that the Retriever and Critic already have for malformed JSON responses. Probably both. -
Add a dedicated Ollama provider if needed - If the OpenAI-compatible
base_urlapproach doesn't cover Ollama well enough, add anOllamaVLMprovider. -
Document tested combinations - List which open-weight models have been tested, what works, and what doesn't. For example: "Qwen2.5-VL via vLLM: planning and critic vision work, JSON mode needs fallback." Users need to know what to expect before they spend time setting up a local server.
Area
New provider support (OpenAI, Anthropic, local models, etc.)
Alternatives considered
- OpenRouter only - OpenRouter already routes to open-weight models and the provider exists, but it's still a hosted API with costs. Doesn't address the "run locally for free" use case.
- Dedicated providers for every backend (vLLM, Ollama, llama.cpp, etc.) - Most of these expose OpenAI-compatible APIs, so separate providers are likely overkill. Better to test the
base_urlapproach first and only add dedicated providers where compatibility actually breaks.
Willingness to contribute
- I'd be willing to submit a PR for this feature