refactor(llm,vision): canonical codec_llm + codec_vision helpers (A-11 + A-12 tranche 1)#68
Merged
Merged
Conversation
Design-first per AGENTS.md §11 before touching the hottest code path (every feature calls an LLM). Documents the reality (codec_llm_proxy is a queue NOT a proxy — no call/stream to reuse; 45 chat/completions sites across sync/ async/streaming; 3 divergent vision impls), the high blast radius, and a phased plan. Recommendation: split A-11 (vision dedup, contained — new codec_vision.py, 3 consumers) from A-12 (chat/completions, audit-flagged "large" — build codec_llm.call/stream + migrate 45 sites in small per-subsystem tranches). PR-3E = A-11 only; A-12 = its own phased effort. Open question: scope (Option 1 A-11-only recommended / 2 A-11+A-12-tranche / 3 A-12-first). No code changed yet. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…st tranche (A-11 + A-12) PR-3E, Option 2. Two new single-source modules replace hand-rolled duplicates on the hottest path in the repo. A-11 (vision, fully closed): new codec_vision.py — describe_sync + describe_async, Gemini-flash -> local-Qwen-VL fallback, config read live from codec_config. All three consumers now delegate: codec.py vision_describe (deleted _gemini_vision / _local_vision), codec_voice._analyze_screenshot (async, reuses self._http), codec_session.screenshot_ctx (now GAINS the Gemini fallback it lacked — a documented behavioral superset). One file to change for a model/provider swap. A-12 (chat/completions, first tranche): the audit's premise that codec_llm_proxy already had call()/stream() was inaccurate — that module is a priority QUEUE, not an HTTP caller. Built genuinely-new codec_llm.py: call() + strip_think / extract_content (headers, Bearer auth, enable_thinking, <think> strip, choices/reasoning parse, retry+backoff, never-raises). Migrated codec.py voice-reply chat + codec_session.qwen_call; removed the now-dead local extract_content in codec_session (canonical copy lives in codec_llm). Deferred to phased follow-ons (each its own design + PR): codec_session.qwen_stream SSE (needs codec_llm.stream()) and the remaining ~40 sites (dashboard, voice generate_response, agents/agent_plan/agent_runner, telegram/imessage bridges, compaction/self_improve/watcher/textassist/dictate). Net -86 LOC in tracked files. Tests: tests/test_llm_vision_dedup.py (19, async driven via asyncio.run — no pytest-asyncio dep). Full suite: 23 known-baseline failures, zero new. No skills/ touched -> no manifest regen. Docs: design doc flipped to IMPLEMENTED (§8), A-11/A-12 closure notes in PHASE-1-CODE-QUALITY + triage, canonical-helpers note in AGENTS.md §2. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-3E (Option 2). Two new single-source modules replace hand-rolled duplicates on the repo's hottest path (every feature calls an LLM / vision). Design-first per AGENTS.md §11 →
docs/PR3E-LLM-VISION-DEDUP-DESIGN.md(now flipped to IMPLEMENTED, see §8).A-11 — vision dedup (fully closed)
codec_vision.py:describe_sync+describe_async, Gemini-flash (gemini-2.0-flash) → local-Qwen-VL fallback, config read live fromcodec_config(provider/model/Keychain changes take effect without restart).codec.pyvision_describe→describe_sync(deleted_gemini_vision/_local_vision).codec_voice._analyze_screenshot→await describe_async(..., http=self._http)(reuses the pipeline's httpx client).codec_session.screenshot_ctx→describe_sync— gains the Gemini fallback it previously lacked (documented behavioral superset).A-12 —
chat/completionscaller (first tranche)codec_llm_proxyalready hadcall()/stream()was inaccurate — that module is a priority queue (semaphore), not an HTTP caller.codec_llm.py(config-agnostic, no import cycle):call()+strip_think/extract_content— headers, Bearer auth,chat_template_kwargs.enable_thinking,<think>strip,choices/reasoningparse, retry +2**nbackoff, never raises.codec.pyvoice-reply chat +codec_session.qwen_call; removed the now-dead localextract_contentincodec_session(canonical copy lives incodec_llm).Deferred (each its own design + PR)
codec_session.qwen_streamSSE → needs acodec_llm.stream()generator.chat/completionssites (dashboard, voicegenerate_response, agents/agent_plan/agent_runner, telegram/imessage bridges, compaction/self_improve/watcher/textassist/dictate).Notes / behavior deltas (documented)
codec_llm.call: on200-but-empty it returns""without retrying (codec.py parity exact at retries=1; forqwen_callretries=3 this skips pointless identical retries — an improvement).Test plan
tests/test_llm_vision_dedup.py— 19 tests:strip_think/extract_contentmatrix;callsuccess / no-key-omits-auth / retries-then-empty / exception-empty;describe_syncgemini-first / fallback / local-only / both-fail;describe_asyncgemini + fallback (driven viaasyncio.run+ fake httpx client — no pytest-asyncio dependency); source-invariant checks (codec.py/voice/session call the canonical helpers, inline impls gone).skills/touched → no manifest regen.🤖 Generated with Claude Code