Skip to content

refactor(llm,vision): canonical codec_llm + codec_vision helpers (A-11 + A-12 tranche 1)#68

Merged
AVADSA25 merged 2 commits into
mainfrom
fix/pr3e-llm-vision-dedup
May 22, 2026
Merged

refactor(llm,vision): canonical codec_llm + codec_vision helpers (A-11 + A-12 tranche 1)#68
AVADSA25 merged 2 commits into
mainfrom
fix/pr3e-llm-vision-dedup

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

Summary

PR-3E (Option 2). Two new single-source modules replace hand-rolled duplicates on the repo's hottest path (every feature calls an LLM / vision). Design-first per AGENTS.md §11 → docs/PR3E-LLM-VISION-DEDUP-DESIGN.md (now flipped to IMPLEMENTED, see §8).

A-11 — vision dedup (fully closed)

  • New codec_vision.py: describe_sync + describe_async, Gemini-flash (gemini-2.0-flash) → local-Qwen-VL fallback, config read live from codec_config (provider/model/Keychain changes take effect without restart).
  • All 3 consumers delegate:
    • codec.py vision_describedescribe_sync (deleted _gemini_vision / _local_vision).
    • codec_voice._analyze_screenshotawait describe_async(..., http=self._http) (reuses the pipeline's httpx client).
    • codec_session.screenshot_ctxdescribe_syncgains the Gemini fallback it previously lacked (documented behavioral superset).

A-12 — chat/completions caller (first tranche)

  • The audit's premise that codec_llm_proxy already had call()/stream() was inaccurate — that module is a priority queue (semaphore), not an HTTP caller.
  • New codec_llm.py (config-agnostic, no import cycle): call() + strip_think / extract_content — headers, Bearer auth, chat_template_kwargs.enable_thinking, <think> strip, choices/reasoning parse, retry + 2**n backoff, never raises.
  • Migrated codec.py voice-reply chat + codec_session.qwen_call; removed the now-dead local extract_content in codec_session (canonical copy lives in codec_llm).

Deferred (each its own design + PR)

  • codec_session.qwen_stream SSE → needs a codec_llm.stream() generator.
  • The remaining ~40 chat/completions sites (dashboard, voice generate_response, agents/agent_plan/agent_runner, telegram/imessage bridges, compaction/self_improve/watcher/textassist/dictate).

Notes / behavior deltas (documented)

  • codec.py voice path: non-200 and empty responses now collapse to one apology ("Sorry, I didn't get a response.") instead of two distinct messages.
  • codec_llm.call: on 200-but-empty it returns "" without retrying (codec.py parity exact at retries=1; for qwen_call retries=3 this skips pointless identical retries — an improvement).
  • Net -86 LOC in tracked files (+3 new modules/tests).

Test plan

  • tests/test_llm_vision_dedup.py — 19 tests: strip_think/extract_content matrix; call success / no-key-omits-auth / retries-then-empty / exception-empty; describe_sync gemini-first / fallback / local-only / both-fail; describe_async gemini + fallback (driven via asyncio.run + fake httpx client — no pytest-asyncio dependency); source-invariant checks (codec.py/voice/session call the canonical helpers, inline impls gone).
  • Full suite: 23 known-baseline failures, zero new (1395 passed, 74 skipped).
  • Ruff: new modules + test clean; codec.py unchanged baseline (E402 from its signal-before-imports pattern); no new F401.
  • No skills/ touched → no manifest regen.
  • Manual (Mac Studio): voice "look at my screen" + a chat screenshot describe correctly via both providers; a voice chat turn replies.

🤖 Generated with Claude Code

Mikarina13 and others added 2 commits May 22, 2026 14:16
Design-first per AGENTS.md §11 before touching the hottest code path (every
feature calls an LLM). Documents the reality (codec_llm_proxy is a queue NOT
a proxy — no call/stream to reuse; 45 chat/completions sites across sync/
async/streaming; 3 divergent vision impls), the high blast radius, and a
phased plan.

Recommendation: split A-11 (vision dedup, contained — new codec_vision.py,
3 consumers) from A-12 (chat/completions, audit-flagged "large" — build
codec_llm.call/stream + migrate 45 sites in small per-subsystem tranches).
PR-3E = A-11 only; A-12 = its own phased effort.

Open question: scope (Option 1 A-11-only recommended / 2 A-11+A-12-tranche /
3 A-12-first). No code changed yet.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…st tranche (A-11 + A-12)

PR-3E, Option 2. Two new single-source modules replace hand-rolled duplicates
on the hottest path in the repo.

A-11 (vision, fully closed): new codec_vision.py — describe_sync + describe_async,
Gemini-flash -> local-Qwen-VL fallback, config read live from codec_config. All
three consumers now delegate: codec.py vision_describe (deleted _gemini_vision /
_local_vision), codec_voice._analyze_screenshot (async, reuses self._http),
codec_session.screenshot_ctx (now GAINS the Gemini fallback it lacked — a
documented behavioral superset). One file to change for a model/provider swap.

A-12 (chat/completions, first tranche): the audit's premise that codec_llm_proxy
already had call()/stream() was inaccurate — that module is a priority QUEUE, not
an HTTP caller. Built genuinely-new codec_llm.py: call() + strip_think /
extract_content (headers, Bearer auth, enable_thinking, <think> strip,
choices/reasoning parse, retry+backoff, never-raises). Migrated codec.py
voice-reply chat + codec_session.qwen_call; removed the now-dead local
extract_content in codec_session (canonical copy lives in codec_llm).

Deferred to phased follow-ons (each its own design + PR): codec_session.qwen_stream
SSE (needs codec_llm.stream()) and the remaining ~40 sites (dashboard, voice
generate_response, agents/agent_plan/agent_runner, telegram/imessage bridges,
compaction/self_improve/watcher/textassist/dictate).

Net -86 LOC in tracked files. Tests: tests/test_llm_vision_dedup.py (19, async
driven via asyncio.run — no pytest-asyncio dep). Full suite: 23 known-baseline
failures, zero new. No skills/ touched -> no manifest regen. Docs: design doc
flipped to IMPLEMENTED (§8), A-11/A-12 closure notes in PHASE-1-CODE-QUALITY +
triage, canonical-helpers note in AGENTS.md §2.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 9b0c1bd into main May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants