feat: Add open-weight VLM support like Ollama, vLLM#120
feat: Add open-weight VLM support like Ollama, vLLM#120statxc wants to merge 5 commits intollmsresearch:mainfrom
Conversation
|
@dippatel1994 Would you please review this PR? I'd appreciate any feedbacks. Thanks |
|
Hi @statxc this is a solid, practical addition for local / open-weight backends: Suggestions (non-blocking):
Tests for |
|
@dippatel1994 Thanks for the great suggestions. I’ve updated everything correctly. It’s clear and more solid now. I’d appreciate you could review it again. |
|
@dippatel1994 Any update for me, please |
dippatel1994
left a comment
There was a problem hiding this comment.
CI passes, solid architecture. The capability-flag approach is the right design. A couple things to address:
-
extract_jsonescape logic applies outside strings — Inpaperbanana/core/utils.py, the backslash handler fires regardless of whether we're inside a JSON string. A\in surrounding LaTeX text will incorrectly setescape_next = Trueand skip the next char, miscounting braces. Move the backslash check inside thein_stringbranch. -
extract_jsononly tries the first{occurrence — If text has a malformed{...}before the actual JSON, the parser finds the first{, fails to parse, then breaks. Never tries the second{. Should continue scanning for the next{instead of breaking. -
Missing
cost_trackerinOllamaVLM— Usage data is available in the response but never recorded. Every other provider callsself.cost_tracker.record_vlm_call(...).
Non-blocking: openai_local shares OPENAI_BASE_URL with the openai provider — a user switching between local/hosted will hit the wrong endpoint. Consider a dedicated OPENAI_LOCAL_BASE_URL or at least document the footgun in .env.example.
…dicated OPENAI_LOCAL_BASE_URL
|
@dippatel1994 Thanks for your feedback. I updated all. Now it is more solid and working well |
dippatel1994
left a comment
There was a problem hiding this comment.
3 of 4 points fixed - escape logic, multi-candidate scanning, and dedicated OPENAI_LOCAL_BASE_URL. Nice work.
Still missing: cost_tracker integration in OllamaVLM. The response has usage data available via data.get("usage") but it's only logged at debug level, never recorded via self.cost_tracker.record_vlm_call(...). Once PR #111 merges, OllamaVLM will be the only provider without cost tracking. Please add it.
|
@dippatel1994 |
dippatel1994
left a comment
There was a problem hiding this comment.
The cost_tracker deferral makes sense since the interface isn't on main yet. All other points are addressed. CI green. LGTM.
Please open a follow-up PR for OllamaVLM cost_tracker after #111 merges.
Sure. thanks. Np |
Summary
ollamaandopenai_localVLM providers for running open-weight models locally without API keyssupports_json_modecapability onVLMProviderso agents skipresponse_format: json_objectfor models that don't support itextract_json()utility for robust JSON parsing from free-form VLM output (markdown fences, embedded JSON, etc.)Closes: #114
Motivation
All existing VLM providers require hosted APIs with API keys. Users running open-weight models (Qwen2.5-VL, LLaVA, etc.) via Ollama or vLLM had no supported path. The OpenAI provider technically worked via
OPENAI_BASE_URL, but JSON mode broke most open-weight models silently.What changed
New providers:
ollama- dedicated provider for Ollama's OpenAI-compatible endpoint, no API key required,json_mode=Falseby default,max_tokenspassthrough,close()for clean client shutdownopenai_local- reuses the OpenAI SDK pointed at a local vLLM/llama.cpp server, skips API key validation,json_mode=Falseby default, distinctopenai_localprovider name in logsCapability system:
VLMProvider.supports_json_modeproperty (defaultTrue, overridden toFalsefor local providers)response_format="json"Robust JSON parsing:
extract_json()incore/utils.py- tries direct parse, then markdown fences, then bracket matchingjson.loads()in Retriever, Critic, and Judge so fenced/wrapped JSON from open-weight models parses correctlyConfig:
OLLAMA_BASE_URL,OLLAMA_MODEL,OLLAMA_JSON_MODE,OPENAI_LOCAL_JSON_MODE.env.examplewith Ollama and vLLM configuration examplesTest plan
close()andmax_tokens),extract_jsonedge cases, capability flags, registry creation, and agent integrationollama pull qwen2.5-vl)