refactor(llm): codec_llm.stream() + migrate streaming/trivial sites (A-12 tranche 2)#69
Merged
Merged
Conversation
…gration (Option 1) Maps all 22 remaining chat/completions text sites; designs the streaming keystone (yields raw deltas, never-raises, sync now / async deferred); isolates the 3 hard constraints (dashboard skill-tag machine, async+queue coupling, raise-on-failure) into later tranches. Scope for this PR: stream() + qwen_stream proof + clean non-streaming trivials (compaction, dictate). textassist + regen deferred to 2c (raise-on-error contract) after read-the-source found never-raise would paste empty over the user's selection / write empty descriptions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…A-12 tranche 2) PR-3E-2, Option 1. Builds the streaming keystone and migrates the lowest-risk remaining chat/completions sites. New: codec_llm.stream() — sync SSE generator that yields the RAW content deltas (data: framing, [DONE] sentinel, choices[0].delta.content, per-chunk parse tolerance), never raises. Think-stripping is intentionally left to the caller (qwen_stream writes deltas live + strip_think()s the accumulated result; the dashboard owns its own tag machine). Extracted a shared _build_request() so call() and stream() can't drift on headers/auth/payload; the stream flag is applied last so extra_kwargs can't clobber it. call() refactored onto it with byte-identical payload (no stream key) — PR-3E tests still green. Migrated: codec_session.qwen_stream (streaming proof — keeps live stdout + strip_think, falls back to the migrated qwen_call on an empty stream), codec_compaction.compact_context (single httpx POST -> call(); also gains enable_thinking=False + <think> strip = slightly cleaner summaries), and codec_dictate draft-refine (hardcoded :8083 now passed as base_url; all failure branches collapse to "use raw body", which never-raise -> "" maps onto exactly). Read-the-source moved 2 of the approved trivials to tranche 2c: codec_textassist RAISEs on failure and its caller's except shows an error overlay — never-raise would pbcopy "" + Cmd-V, pasting EMPTY over the user's selection; and scripts/regen_skill_descriptions uses raise_for_status (fail-loud dev script). Both need codec_llm.call(raise_on_error=True), so they migrate with agent_plan/runner in 2c. Tests: tests/test_llm_stream.py (14 — stream raw-delta/[DONE]/blank+garbage/ non-200/exception/auth/payload-parity, qwen_stream consume+fallback, compaction use+fallback, source invariants). Full suite 1409 passing, 23 known-baseline failures, zero new. Zero net-new ruff. No skills/ touched -> no manifest regen. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR-3E-2 (Option 1). Builds the streaming keystone
codec_llm.stream()and migrates the lowest-risk remainingchat/completionssites. Design-first per AGENTS.md §11 →docs/PR3E2-LLM-STREAM-TRANCHE2-DESIGN.md(maps all 22 remaining sites + the phased roadmap).New API
codec_llm.stream(...)— sync SSE generator yielding the RAW content deltas (data:framing,[DONE]sentinel,choices[0].delta.content, per-chunk parse tolerance). Never raises — HTTP/conn/parse error → empty stream, caller falls back. Think-stripping is intentionally left to the caller (see note below)._build_request()— shared header/auth/payload builder socall()andstream()can't drift;streamflag applied last soextra_kwargscan't clobber it.call()refactored onto it with a byte-identical payload (nostreamkey) — PR-3E'scall()tests stay green.Migrated
codec_session.qwen_stream(streaming proof) — keeps live stdout +strip_think()on the accumulated result; falls back to the already-migratedqwen_callon an empty stream.codec_compaction.compact_context— single httpx POST →call(); also gainsenable_thinking=False+<think>strip (slightly cleaner summaries), fallback unchanged.codec_dictatedraft-refine — hardcoded:8083now passed asbase_url; all failure branches collapse to "use raw body", which never-raise→""maps onto exactly.Why
stream()yields raw (not think-stripped)qwen_streamwrites each delta to stdout live and onlystrip_thinks the accumulated result — internal stripping would silently drop the reasoning it shows live (a parity break) and needs fiddly partial-tag buffering. Raw-yield gives exact parity, is simpler/safer for the first streaming API, and matches reality: the dashboard (deferred) owns its own cross-chunk<think>+[SKILL:…]machine.Scope refinement (read-the-source)
The approved Option 1 listed 4 trivials; reading the downstreams moved 2 to tranche 2c:
codec_textassistRAISEs on failure → itsexceptshows an Error overlay. With never-raise→"", the success path wouldpbcopy ""+ ⌘V, pasting empty over the user's selection + "Text replaced!". Destructive — needsraise_on_error.scripts/regen_skill_descriptionsusesraise_for_status()(fail-loud dev script) — never-raise would write empty descriptions.Both share the raise-on-failure contract with
agent_plan/agent_runner, so they migrate together in 2c whencodec_llm.callgainsraise_on_error. (Honors "never break working code.")Still pending (own PRs)
2c raise-mode (textassist/regen/agent_plan/agent_runner) · async
astream()for voice_stream_qwen+ agents (queue stays at the call site) · dashboard (4 non-stream + the[SKILL:…]stream tag-machine) · bridges · skills tranche.Test plan
tests/test_llm_stream.py— 14 tests:stream()raw-deltas/[DONE]/blank+garbage/non-200/exception/auth/payload-parity-with-call();qwen_streamconsume+strip_think + fallback-on-empty;compactionuse + fallback; source invariants (session/compaction/dictate use the canonical helpers, inline impls gone).codec_dictate24-before/24-after (all pre-existing E701),codec_session1 pre-existing E741 — zero net-new.skills/touched → no manifest regen.qwen_stream), a long-chat compaction, and a "draft …" dictation still work.🤖 Generated with Claude Code