fix(voice): L2 — VoicePipeline reliability (WebSocket guard, task lifecycle, double-save, +6)#172
Merged
Merged
Conversation
Nine findings from the codec_voice.py review. The voice pipeline is the live
WebSocket loop, so each change is behavior-preserving on the happy path and
guarded on the failure paths.
1. WebSocket send-after-close (CRITICAL)
_safe_send_bytes / _safe_send_json + a self._ws_alive flag flipped False on
disconnect (in _audio_receiver). _speak and the crew callback now no-op on a
dead socket instead of raising — a client disconnect during a multi-minute
crew no longer surfaces an exception out of the detached callback.
2. Fire-and-forget tasks (CRITICAL/HIGH)
warmup_llm is now stored (self._warmup_task), gets an exception-logging
done_callback, and is cancelled in run()'s finally (was orphaned → GC-able
mid-flight + un-retrieved-future warnings). The two screen-overlay
run_in_executor futures (orphaned + deprecated get_event_loop()) became a
guarded _spawn_detached() — Popen is already non-blocking.
3. Double save_to_memory (HIGH)
Both run()'s finally AND routes/websocket.py's finally call it → every voice
turn was written to memory.db twice. Now idempotent via a _memory_saved flag.
4. _resolve_voice_option_choice longest-match (HIGH)
The exact-substring loop returned the FIRST matching label, so "yes" beat
"yes and notify" → mis-routed a non-strict multi-option answer. Now prefers
the longest match. (Strict consent is unaffected — it bypasses this resolver.)
5. Echo-cooldown dropped barge-in audio (HIGH)
feed_audio returned None unconditionally during the post-TTS cooldown, so a
user already mid-utterance lost their leading words ("...end my email"). Now
only a NEW speech start is suppressed; in-progress capture continues.
6. Unbounded audio_buffer (MEDIUM)
Continuous mic noise above the VAD threshold kept last_speech_time fresh, so
the silence gate never fired and the buffer grew without bound (~32 KB/s).
Force-flush at MAX_UTTERANCE_BYTES (config: vad.max_utterance_seconds, 30s).
7. utterance_queue head-of-line block (MEDIUM)
await queue.put() on the maxsize=3 queue blocked the receiver while the
pipeline was slow — stalling interrupt/ping control frames. New
_enqueue_utterance() is non-blocking and drops the oldest on overflow.
8. _stream_qwen error sentinel pollution (MEDIUM)
"Sorry, I had a processing error." was appended to self.messages + saved to
memory, so the LLM "saw" a fake apology it never reasoned. A _stream_error
flag now skips persisting it (the user still HEARS it).
9. _announced_question_ids premature mark (HIGH)
The poll marked a question announced BEFORE the announce ran; a failed/closed
announce left the user answering an unheard question. Now the caller marks it
only after a successful announce, so a failure retries next poll.
Test surface: tests/test_voice_reliability_l2.py (10 tests) + updated
test_voice_ask_user::test_poll_skips_already_announced_question for the new
mark-on-success contract. Existing 13 voice-pipeline + 29 ask-user tests pass.
Full suite: 2,065 passed / 77 skipped. ruff clean.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Nine findings from a focused review of
codec_voice.py(the live WebSocket voice loop). Each is behavior-preserving on the happy path and guarded on the failure paths. The user explicitly signed off on fixing the deep voice-async findings._safe_send_*+_ws_aliveflagsave_to_memory(run + route finally) → idempotent_resolve_voice_option_choicefirst-match → longest-matchaudio_bufferunder noise → force-flush caputterance_queuehead-of-line block → non-blocking enqueue_stream_qwenerror sentinel persisted to memory → flagged + skipped_announced_question_idsmarked before announce → mark on success onlyHighlights
1. send-after-close.
_speakand the crew progress callback sent on the raw socket; a disconnect mid-TTS raised — and in the crew callback that exception surfaced out of a detached multi-minute crew. New_safe_send_bytes/_safe_send_jsonno-op once_ws_aliveflips False (set in_audio_receiver's disconnect branch).2. task lifecycle.
warmup_llmwascreate_task'd with no ref (GC-able mid-flight) and no exception retrieval; the screen-overlay used two orphanedrun_in_executorfutures + deprecatedget_event_loop(). Warmup is now stored, exception-logged, and cancelled inrun()'s finally; the overlay uses a guarded_spawn_detached()(Popen is already non-blocking).5 & 6 (VAD). The echo-cooldown returned
Noneunconditionally, so a user already mid-utterance lost their leading words; now only a new speech start is suppressed during cooldown. Continuous mic noise kept the silence gate from firing →audio_buffergrew unbounded; now force-flushed atMAX_UTTERANCE_BYTES(vad.max_utterance_seconds, default 30s).9. unheard questions. The poll marked a question announced before speaking it, so a failed/closed announce left the user answering something they never heard. The caller now marks it only after a successful announce → a failure retries next poll.
Test plan
tests/test_voice_reliability_l2.py— 10 tests: longest-match (+ strict bypass), enqueue-overflow-drops-oldest, runaway-force-flush, barge-in-captured vs new-start-suppressed, save idempotency, stream-error flagtest_voice_ask_user::test_poll_skips_already_announced_questionfor the new mark-on-success contractpython3.13 -m pytest --ignore=tests/test_skills.py -q→ 2,065 passed, 77 skippedruff check: 0 issuestest_voice_pipeline.pyafter every edit batch (VAD changes are the most behaviorally sensitive)🤖 Generated with Claude Code