[Feature]: Support open-weight VLM backends (vLLM, Ollama, etc.)

### Problem or motivation

**Problem**

The pipeline supports five VLM providers (Gemini, OpenAI, OpenRouter, Bedrock, Anthropic) and four image generation providers, but all of them require a hosted API with API keys and usage costs. There's no documented or tested path for running open-weight models like Qwen-VL, LLaVA, or Llama locally.

The OpenAI provider accepts a custom `base_url` via `OPENAI_BASE_URL`, so in theory you could point it at a local vLLM or Ollama server. But this hasn't been tested with open-weight models, and there are known compatibility gaps:

- **JSON mode**: The Retriever and Critic agents pass `response_format="json"`, which the OpenAI provider sends as `{"type": "json_object"}`. Many open-weight models don't support this.
- **Vision**: The Critic sends the generated image to the VLM for evaluation, and the Planner sends reference example images for in-context learning. Both agents require a vision-capable model, so text-only models won't work.


### Proposed solution

1. **Test the existing OpenAI provider with vLLM and Ollama** - Before writing new providers, figure out what already works. Point `OPENAI_BASE_URL` at a local vLLM or Ollama server running a vision-capable model (e.g. Qwen2.5-VL, LLaVA) and run the pipeline end-to-end. Document what breaks.

2. **Handle JSON mode gracefully** - Many open-weight models don't support `response_format: json_object`. Two options: let providers declare whether they support JSON mode so the flag is skipped when they don't, or lean on the fallback parsing that the Retriever and Critic already have for malformed JSON responses. Probably both.

3. **Add a dedicated Ollama provider if needed** - If the OpenAI-compatible `base_url` approach doesn't cover Ollama well enough, add an `OllamaVLM` provider.

4. **Document tested combinations** - List which open-weight models have been tested, what works, and what doesn't. For example: "Qwen2.5-VL via vLLM: planning and critic vision work, JSON mode needs fallback." Users need to know what to expect before they spend time setting up a local server.

### Area

New provider support (OpenAI, Anthropic, local models, etc.)

### Alternatives considered

- **OpenRouter only** - OpenRouter already routes to open-weight models and the provider exists, but it's still a hosted API with costs. Doesn't address the "run locally for free" use case.
- **Dedicated providers for every backend** (vLLM, Ollama, llama.cpp, etc.) - Most of these expose OpenAI-compatible APIs, so separate providers are likely overkill. Better to test the `base_url` approach first and only add dedicated providers where compatibility actually breaks.

### Willingness to contribute

- [x] I'd be willing to submit a PR for this feature

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature]: Support open-weight VLM backends (vLLM, Ollama, etc.) #114

Problem or motivation

Proposed solution

Area

Alternatives considered

Willingness to contribute

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature]: Support open-weight VLM backends (vLLM, Ollama, etc.) #114

Description

Problem or motivation

Proposed solution

Area

Alternatives considered

Willingness to contribute

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions