refactor(llm): codec_llm.stream() + migrate streaming/trivial sites (A-12 tranche 2) by AVADSA25 · Pull Request #69 · AVADSA25/codec

AVADSA25 · 2026-05-22T12:59:01Z

Summary

PR-3E-2 (Option 1). Builds the streaming keystone codec_llm.stream() and migrates the lowest-risk remaining chat/completions sites. Design-first per AGENTS.md §11 → docs/PR3E2-LLM-STREAM-TRANCHE2-DESIGN.md (maps all 22 remaining sites + the phased roadmap).

New API

codec_llm.stream(...) — sync SSE generator yielding the RAW content deltas (data: framing, [DONE] sentinel, choices[0].delta.content, per-chunk parse tolerance). Never raises — HTTP/conn/parse error → empty stream, caller falls back. Think-stripping is intentionally left to the caller (see note below).
_build_request() — shared header/auth/payload builder so call() and stream() can't drift; stream flag applied last so extra_kwargs can't clobber it. call() refactored onto it with a byte-identical payload (no stream key) — PR-3E's call() tests stay green.

Migrated

codec_session.qwen_stream (streaming proof) — keeps live stdout + strip_think() on the accumulated result; falls back to the already-migrated qwen_call on an empty stream.
codec_compaction.compact_context — single httpx POST → call(); also gains enable_thinking=False + <think> strip (slightly cleaner summaries), fallback unchanged.
codec_dictate draft-refine — hardcoded :8083 now passed as base_url; all failure branches collapse to "use raw body", which never-raise→"" maps onto exactly.

Why `stream()` yields raw (not think-stripped)

qwen_stream writes each delta to stdout live and only strip_thinks the accumulated result — internal stripping would silently drop the reasoning it shows live (a parity break) and needs fiddly partial-tag buffering. Raw-yield gives exact parity, is simpler/safer for the first streaming API, and matches reality: the dashboard (deferred) owns its own cross-chunk <think> + [SKILL:…] machine.

Scope refinement (read-the-source)

The approved Option 1 listed 4 trivials; reading the downstreams moved 2 to tranche 2c:

codec_textassist RAISEs on failure → its except shows an Error overlay. With never-raise→"", the success path would pbcopy "" + ⌘V, pasting empty over the user's selection + "Text replaced!". Destructive — needs raise_on_error.
scripts/regen_skill_descriptions uses raise_for_status() (fail-loud dev script) — never-raise would write empty descriptions.

Both share the raise-on-failure contract with agent_plan/agent_runner, so they migrate together in 2c when codec_llm.call gains raise_on_error. (Honors "never break working code.")

Still pending (own PRs)

2c raise-mode (textassist/regen/agent_plan/agent_runner) · async astream() for voice _stream_qwen + agents (queue stays at the call site) · dashboard (4 non-stream + the [SKILL:…] stream tag-machine) · bridges · skills tranche.

Test plan

tests/test_llm_stream.py — 14 tests: stream() raw-deltas/[DONE]/blank+garbage/non-200/exception/auth/payload-parity-with-call(); qwen_stream consume+strip_think + fallback-on-empty; compaction use + fallback; source invariants (session/compaction/dictate use the canonical helpers, inline impls gone).
Full suite: 1409 passed, 23 known-baseline failures, zero new, 74 skipped.
Ruff: new code 0 errors; codec_dictate 24-before/24-after (all pre-existing E701), codec_session 1 pre-existing E741 — zero net-new.
No skills/ touched → no manifest regen.
Manual (Mac Studio): a streaming session reply (qwen_stream), a long-chat compaction, and a "draft …" dictation still work.

🤖 Generated with Claude Code

…gration (Option 1) Maps all 22 remaining chat/completions text sites; designs the streaming keystone (yields raw deltas, never-raises, sync now / async deferred); isolates the 3 hard constraints (dashboard skill-tag machine, async+queue coupling, raise-on-failure) into later tranches. Scope for this PR: stream() + qwen_stream proof + clean non-streaming trivials (compaction, dictate). textassist + regen deferred to 2c (raise-on-error contract) after read-the-source found never-raise would paste empty over the user's selection / write empty descriptions. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

…A-12 tranche 2) PR-3E-2, Option 1. Builds the streaming keystone and migrates the lowest-risk remaining chat/completions sites. New: codec_llm.stream() — sync SSE generator that yields the RAW content deltas (data: framing, [DONE] sentinel, choices[0].delta.content, per-chunk parse tolerance), never raises. Think-stripping is intentionally left to the caller (qwen_stream writes deltas live + strip_think()s the accumulated result; the dashboard owns its own tag machine). Extracted a shared _build_request() so call() and stream() can't drift on headers/auth/payload; the stream flag is applied last so extra_kwargs can't clobber it. call() refactored onto it with byte-identical payload (no stream key) — PR-3E tests still green. Migrated: codec_session.qwen_stream (streaming proof — keeps live stdout + strip_think, falls back to the migrated qwen_call on an empty stream), codec_compaction.compact_context (single httpx POST -> call(); also gains enable_thinking=False + <think> strip = slightly cleaner summaries), and codec_dictate draft-refine (hardcoded :8083 now passed as base_url; all failure branches collapse to "use raw body", which never-raise -> "" maps onto exactly). Read-the-source moved 2 of the approved trivials to tranche 2c: codec_textassist RAISEs on failure and its caller's except shows an error overlay — never-raise would pbcopy "" + Cmd-V, pasting EMPTY over the user's selection; and scripts/regen_skill_descriptions uses raise_for_status (fail-loud dev script). Both need codec_llm.call(raise_on_error=True), so they migrate with agent_plan/runner in 2c. Tests: tests/test_llm_stream.py (14 — stream raw-delta/[DONE]/blank+garbage/ non-200/exception/auth/payload-parity, qwen_stream consume+fallback, compaction use+fallback, source invariants). Full suite 1409 passing, 23 known-baseline failures, zero new. Zero net-new ruff. No skills/ touched -> no manifest regen. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Mikarina13 and others added 2 commits May 22, 2026 14:50

AVADSA25 merged commit b7bb472 into main May 22, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(llm): codec_llm.stream() + migrate streaming/trivial sites (A-12 tranche 2)#69

refactor(llm): codec_llm.stream() + migrate streaming/trivial sites (A-12 tranche 2)#69
AVADSA25 merged 2 commits into
mainfrom
fix/pr3-a12-tranche2-stream

AVADSA25 commented May 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 22, 2026

Summary

New API

Migrated

Why stream() yields raw (not think-stripped)

Scope refinement (read-the-source)

Still pending (own PRs)

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Why `stream()` yields raw (not think-stripped)