refactor(llm): raise_on_error mode + migrate fail-loud sites (A-12 tranche 2c) by AVADSA25 · Pull Request #70 · AVADSA25/codec

AVADSA25 · 2026-05-22T13:30:34Z

Summary

PR-3E-2c. Adds the raise-on-failure contract that tranche 2 deferred, then migrates the 4 sites that must fail loud. Design-first → docs/PR3E2C-RAISE-MODE-DESIGN.md.

New

codec_llm.LLMError + codec_llm.call(raise_on_error=True) — when True, raises LLMError on every non-success path (non-200 after retries, request exception after retries, or a 200 with empty/unparseable content). Default False keeps the never-raise → "" contract, so the streaming/best-effort callers (codec.py, qwen_call, compaction, dictate) are untouched — pinned by a regression-guard test.

Migrated (the 4 fail-loud sites)

codec_textassist.call_qwen — fixes a real bug: on LLM failure the never-raise path would pbcopy "" + ⌘V, pasting empty over the user's selection + "Text replaced!". Now the caller's except shows the Error overlay (also on empty-200). ### FINAL ANSWER: strip stays at the call site; <think> strip now handled by codec_llm.
scripts/regen_skill_descriptions._llm — fail-loud preserved (LLMError propagates like the old raise_for_status; empty-200 now raises instead of writing an empty description).
codec_agent_plan._qwen_chat + codec_agent_runner._qwen_chat — call(raise_on_error=True) behind a thin adapter that maps LLMError → their public QwenUnavailableError, so the daemon's except QwenUnavailableError retry/abort/resume logic is unchanged. Added a parallel _qwen_base() resolver (call-time config). They also gain <think> strip + enable_thinking=False → more robust downstream JSON parsing.

Behavior deltas (documented)

All 4: empty-200 now raises (was: empty paste / empty desc / parse-"") — strict improvement, fail-loud is the intent.
agent_plan/runner: exception message changes but the type QwenUnavailableError is preserved (adapter) — daemon logic unaffected.
No added retries for the agents (retries=1 default = single attempt, matching their old single POST).

Test plan

tests/test_llm_raise_mode.py — 14 tests: raise-mode success / non-200 / exception / empty-200; default-still-never-raises regression guard; agent adapters map LLMError → QwenUnavailableError (asserts the wrapped message) + pass content through on success; source invariants (4 sites call codec_llm.call(, inline POST / raise_for_status gone).
109 agent tests (test_agent_plan / test_agent_runner / test_chat_plan_persistence) still green — the QwenUnavailableError contract holds.
Full suite: 1423 passed, 23 known-baseline failures, zero new, 74 skipped.
Ruff: codec_llm 0 errors; per-file F-delta vs origin/main = 0 on all changed files (pre-existing debt untouched).
No skills/ touched → no manifest regen.
Manual (Mac Studio): a textassist proofread with the LLM down shows the Error overlay (no empty paste); an agent plan/run surfaces QwenUnavailableError when Qwen is down.

🤖 Generated with Claude Code

…s (A-12 tranche 2c) PR-3E-2c. Adds the raise-on-failure contract that tranche 2 deferred, then migrates the 4 sites that MUST fail loud. New: codec_llm.LLMError + codec_llm.call(raise_on_error=True). When True, call() raises LLMError on EVERY non-success path — non-200 (after retries), request exception (after retries), and a 200 with empty/unparseable content. Default stays False (never-raise -> ""), so the existing streaming/best-effort callers (codec.py, qwen_call, compaction, dictate) are untouched — pinned by a regression guard test. Migrated: - codec_textassist.call_qwen -> call(raise_on_error=True). Fixes a real bug: on LLM failure the never-raise path would pbcopy "" + Cmd-V, pasting EMPTY over the user's selection and showing "Text replaced!". Now the caller's except shows the Error overlay (also on empty-200). FINAL-ANSWER strip kept at the call site; <think> strip now handled by codec_llm. - scripts/regen_skill_descriptions._llm -> call(raise_on_error=True). Fail-loud preserved (LLMError propagates like the old raise_for_status; empty-200 now raises instead of writing an empty description). - codec_agent_plan._qwen_chat + codec_agent_runner._qwen_chat -> call( raise_on_error=True) behind a thin adapter that maps LLMError onto their PUBLIC QwenUnavailableError, so the daemon's `except QwenUnavailableError` retry/abort/resume logic is unchanged. Added a parallel _qwen_base() resolver (call-time config). These also gain <think> strip + enable_thinking=False -> more robust JSON parsing downstream. Tests: tests/test_llm_raise_mode.py (14 — raise-mode success/non-200/exception/ empty-200, default-still-never-raises regression guard, agent adapters map to QwenUnavailableError + pass content through, source invariants). 109 agent tests (test_agent_plan/runner/chat_plan_persistence) still green. Full suite 1423 passing, 23 known-baseline failures, zero new. Zero net-new ruff (per-file delta vs origin/main = 0). No skills/ touched -> no manifest regen. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

AVADSA25 merged commit 1ddaa4f into main May 22, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(llm): raise_on_error mode + migrate fail-loud sites (A-12 tranche 2c)#70

refactor(llm): raise_on_error mode + migrate fail-loud sites (A-12 tranche 2c)#70
AVADSA25 merged 1 commit into
mainfrom
fix/pr3-a12-tranche2c-raise-mode

AVADSA25 commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 22, 2026

Summary

New

Migrated (the 4 fail-loud sites)

Behavior deltas (documented)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants