Skip to content

feat: Add open-weight VLM support like Ollama, vLLM#120

Open
statxc wants to merge 5 commits intollmsresearch:mainfrom
statxc:feat/open-weight-vlm-backends
Open

feat: Add open-weight VLM support like Ollama, vLLM#120
statxc wants to merge 5 commits intollmsresearch:mainfrom
statxc:feat/open-weight-vlm-backends

Conversation

@statxc
Copy link
Copy Markdown
Contributor

@statxc statxc commented Mar 25, 2026

Summary

  • Add ollama and openai_local VLM providers for running open-weight models locally without API keys
  • Add supports_json_mode capability on VLMProvider so agents skip response_format: json_object for models that don't support it
  • Add extract_json() utility for robust JSON parsing from free-form VLM output (markdown fences, embedded JSON, etc.)
  • Update Retriever, Critic, and Judge to check provider capabilities and use robust parsing

Closes: #114

Motivation

All existing VLM providers require hosted APIs with API keys. Users running open-weight models (Qwen2.5-VL, LLaVA, etc.) via Ollama or vLLM had no supported path. The OpenAI provider technically worked via OPENAI_BASE_URL, but JSON mode broke most open-weight models silently.

What changed

New providers:

  • ollama - dedicated provider for Ollama's OpenAI-compatible endpoint, no API key required, json_mode=False by default, max_tokens passthrough, close() for clean client shutdown
  • openai_local - reuses the OpenAI SDK pointed at a local vLLM/llama.cpp server, skips API key validation, json_mode=False by default, distinct openai_local provider name in logs

Capability system:

  • VLMProvider.supports_json_mode property (default True, overridden to False for local providers)
  • Retriever, Critic, and Judge check this before sending response_format="json"

Robust JSON parsing:

  • extract_json() in core/utils.py - tries direct parse, then markdown fences, then bracket matching
  • Replaces raw json.loads() in Retriever, Critic, and Judge so fenced/wrapped JSON from open-weight models parses correctly

Config:

  • New settings: OLLAMA_BASE_URL, OLLAMA_MODEL, OLLAMA_JSON_MODE, OPENAI_LOCAL_JSON_MODE
  • Updated .env.example with Ollama and vLLM configuration examples

Test plan

  • 35 new tests covering Ollama provider (including close() and max_tokens), extract_json edge cases, capability flags, registry creation, and agent integration
  • Full existing test suite passes (331 total, 0 regressions)
  • Lint and format checks pass
  • Manual verification needed: run with actual Ollama/vLLM serving a vision model (e.g. ollama pull qwen2.5-vl)

@statxc
Copy link
Copy Markdown
Contributor Author

statxc commented Mar 25, 2026

@dippatel1994 Would you please review this PR? I'd appreciate any feedbacks. Thanks

@dippatel1994
Copy link
Copy Markdown
Member

Hi @statxc this is a solid, practical addition for local / open-weight backends: supports_json_mode, conditional response_format, extract_json, dedicated OllamaVLM, and openai_local wrapping OpenAIVLM with json_mode defaulting off matches the failure mode we see with vLLM/Ollama. Retriever/Critic/Judge updates are the right call sites.

Suggestions (non-blocking):

  • Pass max_tokens through on the Ollama HTTP payload so long outputs aren’t truncated at server defaults.
  • Consider closing or scoping the cached httpx.AsyncClient in OllamaVLM for tests and clean shutdown.
  • Optional: make provider name/logs distinguish openai_local from hosted openai for easier debugging.

Tests for extract_json and registry wiring look good. Thanks — happy to see this land for #114.

@statxc
Copy link
Copy Markdown
Contributor Author

statxc commented Mar 25, 2026

@dippatel1994 Thanks for the great suggestions. I’ve updated everything correctly. It’s clear and more solid now. I’d appreciate you could review it again.

@statxc
Copy link
Copy Markdown
Contributor Author

statxc commented Apr 1, 2026

@dippatel1994 Any update for me, please

Copy link
Copy Markdown
Member

@dippatel1994 dippatel1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI passes, solid architecture. The capability-flag approach is the right design. A couple things to address:

  1. extract_json escape logic applies outside strings — In paperbanana/core/utils.py, the backslash handler fires regardless of whether we're inside a JSON string. A \ in surrounding LaTeX text will incorrectly set escape_next = True and skip the next char, miscounting braces. Move the backslash check inside the in_string branch.

  2. extract_json only tries the first { occurrence — If text has a malformed {...} before the actual JSON, the parser finds the first {, fails to parse, then breaks. Never tries the second {. Should continue scanning for the next { instead of breaking.

  3. Missing cost_tracker in OllamaVLM — Usage data is available in the response but never recorded. Every other provider calls self.cost_tracker.record_vlm_call(...).

Non-blocking: openai_local shares OPENAI_BASE_URL with the openai provider — a user switching between local/hosted will hit the wrong endpoint. Consider a dedicated OPENAI_LOCAL_BASE_URL or at least document the footgun in .env.example.

@statxc statxc requested a review from dippatel1994 April 2, 2026 20:18
@statxc
Copy link
Copy Markdown
Contributor Author

statxc commented Apr 2, 2026

@dippatel1994 Thanks for your feedback. I updated all. Now it is more solid and working well

Copy link
Copy Markdown
Member

@dippatel1994 dippatel1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 of 4 points fixed - escape logic, multi-candidate scanning, and dedicated OPENAI_LOCAL_BASE_URL. Nice work.

Still missing: cost_tracker integration in OllamaVLM. The response has usage data available via data.get("usage") but it's only logged at debug level, never recorded via self.cost_tracker.record_vlm_call(...). Once PR #111 merges, OllamaVLM will be the only provider without cost tracking. Please add it.

@statxc
Copy link
Copy Markdown
Contributor Author

statxc commented Apr 2, 2026

@dippatel1994
I got your point. But there's no cost_tracker in the current codebase. Happy to add it once PR #111 merges and the interface is available.
First, I recommend merge this PR. And regarding "cost_tracker" function, I will add in another PR to prevent conflict. Sounds good?

Copy link
Copy Markdown
Member

@dippatel1994 dippatel1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cost_tracker deferral makes sense since the interface isn't on main yet. All other points are addressed. CI green. LGTM.

Please open a follow-up PR for OllamaVLM cost_tracker after #111 merges.

@statxc
Copy link
Copy Markdown
Contributor Author

statxc commented Apr 2, 2026

The cost_tracker deferral makes sense since the interface isn't on main yet. All other points are addressed. CI green. LGTM.

Please open a follow-up PR for OllamaVLM cost_tracker after #111 merges.

Sure. thanks. Np

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Support open-weight VLM backends (vLLM, Ollama, etc.)

2 participants