refactor(chat): extract SkillTagBuffer from chat_completion stream (A-6, PR-3D-c)#74
Merged
Conversation
…-6, PR-3D-c) Final piece of the PR-3D split. Extracts the streaming <think> + [SKILL:...] tag-machine out of codec_dashboard.chat_completion._stream_gen into a new, tested module — exactly as the audit recommended. New codec_chat_stream.py: - SkillTagBuffer — the stateful token processor. feed(token)/finish() are generators that yield clean text fragments to emit; it strips <think>…</think> across chunks and buffers [SKILL:name:query] tags char-by-char so a raw tag never leaks, resolving complete tags via an injected resolve_skill_tag(raw) callback (skill execution is I/O, hence injected). visible_chars lets the caller detect the all-tags-dropped blank-bubble case. - SKILL_TAG_RE — the shared tag pattern (buffer detection + the dashboard resolver). _stream_gen now keeps only the SSE/HTTP plumbing (POST, iter_lines, data:/[DONE] framing, keepalive, blank-bubble fallback) + the injected _resolve_skill_tag (budget + allowlist + dispatch). chat_completion 466 -> 379 LOC. Behavior preserved EXACTLY, including the subtle quirks: a same-chunk </think> is dropped (token zeroed after <think>), think-adjacent text is emitted but not counted toward visible_chars, dropped (resolved-to-empty) tags still emit their empty frame, the 5000-char safety cap, and cross-chunk tag assembly. The non-streaming post-LLM [SKILL:] path is untouched. Bonus: SkillTagBuffer is the tested unit the deferred A-12 dashboard-stream migration needs — the dashboard stream can now consume codec_llm.stream()'s raw tokens through it. Tests: tests/test_chat_stream.py (13 — passthrough, think cross-chunk + same-chunk quirk, tag resolved/dropped/assembled-across-tokens, non-tag bracket passthrough, prefix divergence, 5000 cap, finish flush, regex). Full suite 1464 passing, 23 known-baseline failures, zero new. Zero net-new ruff. No skills/ touched. PR-3D complete: all three monoliths decomposed (A-7 #72, A-5 #73, A-6 here). Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Final piece of the PR-3D split — extracts the streaming
<think>+[SKILL:...]tag-machine out ofcodec_dashboard.chat_completion._stream_geninto a new, tested module, exactly as the audit recommended.New
codec_chat_stream.pySkillTagBuffer— the stateful token processor.feed(token)/finish()are generators that yield clean text fragments to emit; it strips<think>…</think>across chunks and buffers[SKILL:name:query]tags char-by-char so a raw tag never leaks, resolving complete tags via an injectedresolve_skill_tag(raw)callback (skill execution is I/O, hence injected).visible_charslets the caller detect the all-tags-dropped blank-bubble case.SKILL_TAG_RE— the shared tag pattern (buffer detection + the dashboard resolver)._stream_gennow keeps only the SSE/HTTP plumbing (POST,iter_lines,data:/[DONE]framing, keepalive, blank-bubble fallback) + the injected_resolve_skill_tag(budget + allowlist + dispatch).chat_completion466 → 379 LOC.Behavior preserved exactly
Including the subtle quirks: a same-chunk
</think>is dropped (token zeroed after<think>); think-adjacent text is emitted but not counted towardvisible_chars; dropped (resolved-to-empty) tags still emit their empty frame; the 5000-char safety cap; cross-chunk tag assembly. The non-streaming post-LLM[SKILL:]path is untouched.Bonus:
SkillTagBufferis the tested unit the deferred A-12 dashboard-stream migration needs — the dashboard stream can now consumecodec_llm.stream()'s raw tokens through it.Test plan
tests/test_chat_stream.py— 13 tests: passthrough;<think>cross-chunk + same-chunk-drop quirk; tag resolved / dropped (no leak) / assembled-across-tokens; non-tag bracket passthrough; prefix divergence; 5000-char cap;finish()flush; regex.skills/touched → no manifest regen.[SKILL:...]the LLM emits resolves inline (no raw tag leak); an all-dropped-tags response shows the fallback bubble.PR-3D complete
All three monoliths decomposed — A-7
Agent.run(#72), A-5_dispatch_inner(#73), A-6chat_completion(this PR).🤖 Generated with Claude Code