Skip to content

refactor(llm): codec_llm.stream() + migrate streaming/trivial sites (A-12 tranche 2)#69

Merged
AVADSA25 merged 2 commits into
mainfrom
fix/pr3-a12-tranche2-stream
May 22, 2026
Merged

refactor(llm): codec_llm.stream() + migrate streaming/trivial sites (A-12 tranche 2)#69
AVADSA25 merged 2 commits into
mainfrom
fix/pr3-a12-tranche2-stream

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

Summary

PR-3E-2 (Option 1). Builds the streaming keystone codec_llm.stream() and migrates the lowest-risk remaining chat/completions sites. Design-first per AGENTS.md §11 → docs/PR3E2-LLM-STREAM-TRANCHE2-DESIGN.md (maps all 22 remaining sites + the phased roadmap).

New API

  • codec_llm.stream(...) — sync SSE generator yielding the RAW content deltas (data: framing, [DONE] sentinel, choices[0].delta.content, per-chunk parse tolerance). Never raises — HTTP/conn/parse error → empty stream, caller falls back. Think-stripping is intentionally left to the caller (see note below).
  • _build_request() — shared header/auth/payload builder so call() and stream() can't drift; stream flag applied last so extra_kwargs can't clobber it. call() refactored onto it with a byte-identical payload (no stream key) — PR-3E's call() tests stay green.

Migrated

  • codec_session.qwen_stream (streaming proof) — keeps live stdout + strip_think() on the accumulated result; falls back to the already-migrated qwen_call on an empty stream.
  • codec_compaction.compact_context — single httpx POST → call(); also gains enable_thinking=False + <think> strip (slightly cleaner summaries), fallback unchanged.
  • codec_dictate draft-refine — hardcoded :8083 now passed as base_url; all failure branches collapse to "use raw body", which never-raise→"" maps onto exactly.

Why stream() yields raw (not think-stripped)

qwen_stream writes each delta to stdout live and only strip_thinks the accumulated result — internal stripping would silently drop the reasoning it shows live (a parity break) and needs fiddly partial-tag buffering. Raw-yield gives exact parity, is simpler/safer for the first streaming API, and matches reality: the dashboard (deferred) owns its own cross-chunk <think> + [SKILL:…] machine.

Scope refinement (read-the-source)

The approved Option 1 listed 4 trivials; reading the downstreams moved 2 to tranche 2c:

  • codec_textassist RAISEs on failure → its except shows an Error overlay. With never-raise→"", the success path would pbcopy "" + ⌘V, pasting empty over the user's selection + "Text replaced!". Destructive — needs raise_on_error.
  • scripts/regen_skill_descriptions uses raise_for_status() (fail-loud dev script) — never-raise would write empty descriptions.

Both share the raise-on-failure contract with agent_plan/agent_runner, so they migrate together in 2c when codec_llm.call gains raise_on_error. (Honors "never break working code.")

Still pending (own PRs)

2c raise-mode (textassist/regen/agent_plan/agent_runner) · async astream() for voice _stream_qwen + agents (queue stays at the call site) · dashboard (4 non-stream + the [SKILL:…] stream tag-machine) · bridges · skills tranche.

Test plan

  • tests/test_llm_stream.py — 14 tests: stream() raw-deltas/[DONE]/blank+garbage/non-200/exception/auth/payload-parity-with-call(); qwen_stream consume+strip_think + fallback-on-empty; compaction use + fallback; source invariants (session/compaction/dictate use the canonical helpers, inline impls gone).
  • Full suite: 1409 passed, 23 known-baseline failures, zero new, 74 skipped.
  • Ruff: new code 0 errors; codec_dictate 24-before/24-after (all pre-existing E701), codec_session 1 pre-existing E741 — zero net-new.
  • No skills/ touched → no manifest regen.
  • Manual (Mac Studio): a streaming session reply (qwen_stream), a long-chat compaction, and a "draft …" dictation still work.

🤖 Generated with Claude Code

Mikarina13 and others added 2 commits May 22, 2026 14:50
…gration (Option 1)

Maps all 22 remaining chat/completions text sites; designs the streaming
keystone (yields raw deltas, never-raises, sync now / async deferred); isolates
the 3 hard constraints (dashboard skill-tag machine, async+queue coupling,
raise-on-failure) into later tranches. Scope for this PR: stream() + qwen_stream
proof + clean non-streaming trivials (compaction, dictate). textassist + regen
deferred to 2c (raise-on-error contract) after read-the-source found never-raise
would paste empty over the user's selection / write empty descriptions.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…A-12 tranche 2)

PR-3E-2, Option 1. Builds the streaming keystone and migrates the lowest-risk
remaining chat/completions sites.

New: codec_llm.stream() — sync SSE generator that yields the RAW content deltas
(data: framing, [DONE] sentinel, choices[0].delta.content, per-chunk parse
tolerance), never raises. Think-stripping is intentionally left to the caller
(qwen_stream writes deltas live + strip_think()s the accumulated result; the
dashboard owns its own tag machine). Extracted a shared _build_request() so
call() and stream() can't drift on headers/auth/payload; the stream flag is
applied last so extra_kwargs can't clobber it. call() refactored onto it with
byte-identical payload (no stream key) — PR-3E tests still green.

Migrated: codec_session.qwen_stream (streaming proof — keeps live stdout +
strip_think, falls back to the migrated qwen_call on an empty stream),
codec_compaction.compact_context (single httpx POST -> call(); also gains
enable_thinking=False + <think> strip = slightly cleaner summaries), and
codec_dictate draft-refine (hardcoded :8083 now passed as base_url; all failure
branches collapse to "use raw body", which never-raise -> "" maps onto exactly).

Read-the-source moved 2 of the approved trivials to tranche 2c: codec_textassist
RAISEs on failure and its caller's except shows an error overlay — never-raise
would pbcopy "" + Cmd-V, pasting EMPTY over the user's selection; and
scripts/regen_skill_descriptions uses raise_for_status (fail-loud dev script).
Both need codec_llm.call(raise_on_error=True), so they migrate with
agent_plan/runner in 2c.

Tests: tests/test_llm_stream.py (14 — stream raw-delta/[DONE]/blank+garbage/
non-200/exception/auth/payload-parity, qwen_stream consume+fallback, compaction
use+fallback, source invariants). Full suite 1409 passing, 23 known-baseline
failures, zero new. Zero net-new ruff. No skills/ touched -> no manifest regen.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit b7bb472 into main May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants