refactor(llm,vision): canonical codec_llm + codec_vision helpers (A-11 + A-12 tranche 1) by AVADSA25 · Pull Request #68 · AVADSA25/codec

AVADSA25 · 2026-05-22T12:34:50Z

Summary

PR-3E (Option 2). Two new single-source modules replace hand-rolled duplicates on the repo's hottest path (every feature calls an LLM / vision). Design-first per AGENTS.md §11 → docs/PR3E-LLM-VISION-DEDUP-DESIGN.md (now flipped to IMPLEMENTED, see §8).

A-11 — vision dedup (fully closed)

New codec_vision.py: describe_sync + describe_async, Gemini-flash (gemini-2.0-flash) → local-Qwen-VL fallback, config read live from codec_config (provider/model/Keychain changes take effect without restart).
All 3 consumers delegate:
- codec.py vision_describe → describe_sync (deleted _gemini_vision / _local_vision).
- codec_voice._analyze_screenshot → await describe_async(..., http=self._http) (reuses the pipeline's httpx client).
- codec_session.screenshot_ctx → describe_sync — gains the Gemini fallback it previously lacked (documented behavioral superset).

A-12 — `chat/completions` caller (first tranche)

The audit's premise that codec_llm_proxy already had call()/stream() was inaccurate — that module is a priority queue (semaphore), not an HTTP caller.
New codec_llm.py (config-agnostic, no import cycle): call() + strip_think / extract_content — headers, Bearer auth, chat_template_kwargs.enable_thinking, <think> strip, choices/reasoning parse, retry + 2**n backoff, never raises.
Migrated codec.py voice-reply chat + codec_session.qwen_call; removed the now-dead local extract_content in codec_session (canonical copy lives in codec_llm).

Deferred (each its own design + PR)

codec_session.qwen_stream SSE → needs a codec_llm.stream() generator.
The remaining ~40 chat/completions sites (dashboard, voice generate_response, agents/agent_plan/agent_runner, telegram/imessage bridges, compaction/self_improve/watcher/textassist/dictate).

Notes / behavior deltas (documented)

codec.py voice path: non-200 and empty responses now collapse to one apology ("Sorry, I didn't get a response.") instead of two distinct messages.
codec_llm.call: on 200-but-empty it returns "" without retrying (codec.py parity exact at retries=1; for qwen_call retries=3 this skips pointless identical retries — an improvement).
Net -86 LOC in tracked files (+3 new modules/tests).

Test plan

tests/test_llm_vision_dedup.py — 19 tests: strip_think/extract_content matrix; call success / no-key-omits-auth / retries-then-empty / exception-empty; describe_sync gemini-first / fallback / local-only / both-fail; describe_async gemini + fallback (driven via asyncio.run + fake httpx client — no pytest-asyncio dependency); source-invariant checks (codec.py/voice/session call the canonical helpers, inline impls gone).
Full suite: 23 known-baseline failures, zero new (1395 passed, 74 skipped).
Ruff: new modules + test clean; codec.py unchanged baseline (E402 from its signal-before-imports pattern); no new F401.
No skills/ touched → no manifest regen.
Manual (Mac Studio): voice "look at my screen" + a chat screenshot describe correctly via both providers; a voice chat turn replies.

🤖 Generated with Claude Code

Design-first per AGENTS.md §11 before touching the hottest code path (every feature calls an LLM). Documents the reality (codec_llm_proxy is a queue NOT a proxy — no call/stream to reuse; 45 chat/completions sites across sync/ async/streaming; 3 divergent vision impls), the high blast radius, and a phased plan. Recommendation: split A-11 (vision dedup, contained — new codec_vision.py, 3 consumers) from A-12 (chat/completions, audit-flagged "large" — build codec_llm.call/stream + migrate 45 sites in small per-subsystem tranches). PR-3E = A-11 only; A-12 = its own phased effort. Open question: scope (Option 1 A-11-only recommended / 2 A-11+A-12-tranche / 3 A-12-first). No code changed yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…st tranche (A-11 + A-12) PR-3E, Option 2. Two new single-source modules replace hand-rolled duplicates on the hottest path in the repo. A-11 (vision, fully closed): new codec_vision.py — describe_sync + describe_async, Gemini-flash -> local-Qwen-VL fallback, config read live from codec_config. All three consumers now delegate: codec.py vision_describe (deleted _gemini_vision / _local_vision), codec_voice._analyze_screenshot (async, reuses self._http), codec_session.screenshot_ctx (now GAINS the Gemini fallback it lacked — a documented behavioral superset). One file to change for a model/provider swap. A-12 (chat/completions, first tranche): the audit's premise that codec_llm_proxy already had call()/stream() was inaccurate — that module is a priority QUEUE, not an HTTP caller. Built genuinely-new codec_llm.py: call() + strip_think / extract_content (headers, Bearer auth, enable_thinking, <think> strip, choices/reasoning parse, retry+backoff, never-raises). Migrated codec.py voice-reply chat + codec_session.qwen_call; removed the now-dead local extract_content in codec_session (canonical copy lives in codec_llm). Deferred to phased follow-ons (each its own design + PR): codec_session.qwen_stream SSE (needs codec_llm.stream()) and the remaining ~40 sites (dashboard, voice generate_response, agents/agent_plan/agent_runner, telegram/imessage bridges, compaction/self_improve/watcher/textassist/dictate). Net -86 LOC in tracked files. Tests: tests/test_llm_vision_dedup.py (19, async driven via asyncio.run — no pytest-asyncio dep). Full suite: 23 known-baseline failures, zero new. No skills/ touched -> no manifest regen. Docs: design doc flipped to IMPLEMENTED (§8), A-11/A-12 closure notes in PHASE-1-CODE-QUALITY + triage, canonical-helpers note in AGENTS.md §2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Mikarina13 and others added 2 commits May 22, 2026 14:16

AVADSA25 merged commit 9b0c1bd into main May 22, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(llm,vision): canonical codec_llm + codec_vision helpers (A-11 + A-12 tranche 1)#68

refactor(llm,vision): canonical codec_llm + codec_vision helpers (A-11 + A-12 tranche 1)#68
AVADSA25 merged 2 commits into
mainfrom
fix/pr3e-llm-vision-dedup

AVADSA25 commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 22, 2026

Summary

A-11 — vision dedup (fully closed)

A-12 — chat/completions caller (first tranche)

Deferred (each its own design + PR)

Notes / behavior deltas (documented)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

A-12 — `chat/completions` caller (first tranche)