refactor(llm): raise_on_error mode + migrate fail-loud sites (A-12 tranche 2c)#70
Merged
Merged
Conversation
…s (A-12 tranche 2c) PR-3E-2c. Adds the raise-on-failure contract that tranche 2 deferred, then migrates the 4 sites that MUST fail loud. New: codec_llm.LLMError + codec_llm.call(raise_on_error=True). When True, call() raises LLMError on EVERY non-success path — non-200 (after retries), request exception (after retries), and a 200 with empty/unparseable content. Default stays False (never-raise -> ""), so the existing streaming/best-effort callers (codec.py, qwen_call, compaction, dictate) are untouched — pinned by a regression guard test. Migrated: - codec_textassist.call_qwen -> call(raise_on_error=True). Fixes a real bug: on LLM failure the never-raise path would pbcopy "" + Cmd-V, pasting EMPTY over the user's selection and showing "Text replaced!". Now the caller's except shows the Error overlay (also on empty-200). FINAL-ANSWER strip kept at the call site; <think> strip now handled by codec_llm. - scripts/regen_skill_descriptions._llm -> call(raise_on_error=True). Fail-loud preserved (LLMError propagates like the old raise_for_status; empty-200 now raises instead of writing an empty description). - codec_agent_plan._qwen_chat + codec_agent_runner._qwen_chat -> call( raise_on_error=True) behind a thin adapter that maps LLMError onto their PUBLIC QwenUnavailableError, so the daemon's `except QwenUnavailableError` retry/abort/resume logic is unchanged. Added a parallel _qwen_base() resolver (call-time config). These also gain <think> strip + enable_thinking=False -> more robust JSON parsing downstream. Tests: tests/test_llm_raise_mode.py (14 — raise-mode success/non-200/exception/ empty-200, default-still-never-raises regression guard, agent adapters map to QwenUnavailableError + pass content through, source invariants). 109 agent tests (test_agent_plan/runner/chat_plan_persistence) still green. Full suite 1423 passing, 23 known-baseline failures, zero new. Zero net-new ruff (per-file delta vs origin/main = 0). No skills/ touched -> no manifest regen. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-3E-2c. Adds the raise-on-failure contract that tranche 2 deferred, then migrates the 4 sites that must fail loud. Design-first →
docs/PR3E2C-RAISE-MODE-DESIGN.md.New
codec_llm.LLMError+codec_llm.call(raise_on_error=True)— whenTrue, raisesLLMErroron every non-success path (non-200 after retries, request exception after retries, or a 200 with empty/unparseable content). DefaultFalsekeeps the never-raise →""contract, so the streaming/best-effort callers (codec.py,qwen_call,compaction,dictate) are untouched — pinned by a regression-guard test.Migrated (the 4 fail-loud sites)
codec_textassist.call_qwen— fixes a real bug: on LLM failure the never-raise path wouldpbcopy ""+ ⌘V, pasting empty over the user's selection + "Text replaced!". Now the caller'sexceptshows the Error overlay (also on empty-200).### FINAL ANSWER:strip stays at the call site;<think>strip now handled by codec_llm.scripts/regen_skill_descriptions._llm— fail-loud preserved (LLMErrorpropagates like the oldraise_for_status; empty-200 now raises instead of writing an empty description).codec_agent_plan._qwen_chat+codec_agent_runner._qwen_chat—call(raise_on_error=True)behind a thin adapter that mapsLLMError→ their publicQwenUnavailableError, so the daemon'sexcept QwenUnavailableErrorretry/abort/resume logic is unchanged. Added a parallel_qwen_base()resolver (call-time config). They also gain<think>strip +enable_thinking=False→ more robust downstream JSON parsing.Behavior deltas (documented)
"") — strict improvement, fail-loud is the intent.QwenUnavailableErroris preserved (adapter) — daemon logic unaffected.retries=1default = single attempt, matching their old single POST).Test plan
tests/test_llm_raise_mode.py— 14 tests: raise-mode success / non-200 / exception / empty-200; default-still-never-raises regression guard; agent adapters mapLLMError→QwenUnavailableError(asserts the wrapped message) + pass content through on success; source invariants (4 sites callcodec_llm.call(, inline POST /raise_for_statusgone).test_agent_plan/test_agent_runner/test_chat_plan_persistence) still green — theQwenUnavailableErrorcontract holds.skills/touched → no manifest regen.QwenUnavailableErrorwhen Qwen is down.🤖 Generated with Claude Code