Skip to content

refactor(chat): extract SkillTagBuffer from chat_completion stream (A-6, PR-3D-c)#74

Merged
AVADSA25 merged 1 commit into
mainfrom
fix/pr3d-c-chat-stream
May 22, 2026
Merged

refactor(chat): extract SkillTagBuffer from chat_completion stream (A-6, PR-3D-c)#74
AVADSA25 merged 1 commit into
mainfrom
fix/pr3d-c-chat-stream

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

Summary

Final piece of the PR-3D split — extracts the streaming <think> + [SKILL:...] tag-machine out of codec_dashboard.chat_completion._stream_gen into a new, tested module, exactly as the audit recommended.

New codec_chat_stream.py

  • SkillTagBuffer — the stateful token processor. feed(token) / finish() are generators that yield clean text fragments to emit; it strips <think>…</think> across chunks and buffers [SKILL:name:query] tags char-by-char so a raw tag never leaks, resolving complete tags via an injected resolve_skill_tag(raw) callback (skill execution is I/O, hence injected). visible_chars lets the caller detect the all-tags-dropped blank-bubble case.
  • SKILL_TAG_RE — the shared tag pattern (buffer detection + the dashboard resolver).

_stream_gen now keeps only the SSE/HTTP plumbing (POST, iter_lines, data:/[DONE] framing, keepalive, blank-bubble fallback) + the injected _resolve_skill_tag (budget + allowlist + dispatch). chat_completion 466 → 379 LOC.

Behavior preserved exactly

Including the subtle quirks: a same-chunk </think> is dropped (token zeroed after <think>); think-adjacent text is emitted but not counted toward visible_chars; dropped (resolved-to-empty) tags still emit their empty frame; the 5000-char safety cap; cross-chunk tag assembly. The non-streaming post-LLM [SKILL:] path is untouched.

Bonus: SkillTagBuffer is the tested unit the deferred A-12 dashboard-stream migration needs — the dashboard stream can now consume codec_llm.stream()'s raw tokens through it.

Test plan

  • tests/test_chat_stream.py — 13 tests: passthrough; <think> cross-chunk + same-chunk-drop quirk; tag resolved / dropped (no leak) / assembled-across-tokens; non-tag bracket passthrough; prefix divergence; 5000-char cap; finish() flush; regex.
  • Full suite: 1464 passed, 23 known-baseline failures, zero new, 74 skipped (chat-path suites — step_budget, agents — all green).
  • Ruff: new module + test clean; codec_dashboard.py F-delta vs origin/main = 0.
  • No skills/ touched → no manifest regen.
  • Manual (Mac Studio): a streaming chat reply renders; a [SKILL:...] the LLM emits resolves inline (no raw tag leak); an all-dropped-tags response shows the fallback bubble.

PR-3D complete

All three monoliths decomposed — A-7 Agent.run (#72), A-5 _dispatch_inner (#73), A-6 chat_completion (this PR).

🤖 Generated with Claude Code

…-6, PR-3D-c)

Final piece of the PR-3D split. Extracts the streaming <think> + [SKILL:...]
tag-machine out of codec_dashboard.chat_completion._stream_gen into a new,
tested module — exactly as the audit recommended.

New codec_chat_stream.py:
- SkillTagBuffer — the stateful token processor. feed(token)/finish() are
  generators that yield clean text fragments to emit; it strips <think>…</think>
  across chunks and buffers [SKILL:name:query] tags char-by-char so a raw tag
  never leaks, resolving complete tags via an injected resolve_skill_tag(raw)
  callback (skill execution is I/O, hence injected). visible_chars lets the
  caller detect the all-tags-dropped blank-bubble case.
- SKILL_TAG_RE — the shared tag pattern (buffer detection + the dashboard resolver).

_stream_gen now keeps only the SSE/HTTP plumbing (POST, iter_lines, data:/[DONE]
framing, keepalive, blank-bubble fallback) + the injected _resolve_skill_tag
(budget + allowlist + dispatch). chat_completion 466 -> 379 LOC.

Behavior preserved EXACTLY, including the subtle quirks: a same-chunk </think>
is dropped (token zeroed after <think>), think-adjacent text is emitted but not
counted toward visible_chars, dropped (resolved-to-empty) tags still emit their
empty frame, the 5000-char safety cap, and cross-chunk tag assembly. The
non-streaming post-LLM [SKILL:] path is untouched.

Bonus: SkillTagBuffer is the tested unit the deferred A-12 dashboard-stream
migration needs — the dashboard stream can now consume codec_llm.stream()'s raw
tokens through it.

Tests: tests/test_chat_stream.py (13 — passthrough, think cross-chunk + same-chunk
quirk, tag resolved/dropped/assembled-across-tokens, non-tag bracket passthrough,
prefix divergence, 5000 cap, finish flush, regex). Full suite 1464 passing, 23
known-baseline failures, zero new. Zero net-new ruff. No skills/ touched.

PR-3D complete: all three monoliths decomposed (A-7 #72, A-5 #73, A-6 here).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit 8a64b9c into main May 22, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants