fix(voice): L2 — VoicePipeline reliability (WebSocket guard, task lifecycle, double-save, +6) by AVADSA25 · Pull Request #172 · AVADSA25/codec

AVADSA25 · 2026-05-31T16:13:28Z

Summary

Nine findings from a focused review of codec_voice.py (the live WebSocket voice loop). Each is behavior-preserving on the happy path and guarded on the failure paths. The user explicitly signed off on fixing the deep voice-async findings.

#	Severity	Fix
1	Critical	WebSocket send-after-close → `_safe_send_*` + `_ws_alive` flag
2	Critical	Fire-and-forget tasks orphaned → stored + cancelled + exception-logged
3	High	Double `save_to_memory` (run + route finally) → idempotent
4	High	`_resolve_voice_option_choice` first-match → longest-match
5	High	Echo-cooldown dropped barge-in audio → only suppress new starts
6	Medium	Unbounded `audio_buffer` under noise → force-flush cap
7	Medium	`utterance_queue` head-of-line block → non-blocking enqueue
8	Medium	`_stream_qwen` error sentinel persisted to memory → flagged + skipped
9	High	`_announced_question_ids` marked before announce → mark on success only

Highlights

1. send-after-close. _speak and the crew progress callback sent on the raw socket; a disconnect mid-TTS raised — and in the crew callback that exception surfaced out of a detached multi-minute crew. New _safe_send_bytes/_safe_send_json no-op once _ws_alive flips False (set in _audio_receiver's disconnect branch).

2. task lifecycle. warmup_llm was create_task'd with no ref (GC-able mid-flight) and no exception retrieval; the screen-overlay used two orphaned run_in_executor futures + deprecated get_event_loop(). Warmup is now stored, exception-logged, and cancelled in run()'s finally; the overlay uses a guarded _spawn_detached() (Popen is already non-blocking).

5 & 6 (VAD). The echo-cooldown returned None unconditionally, so a user already mid-utterance lost their leading words; now only a new speech start is suppressed during cooldown. Continuous mic noise kept the silence gate from firing → audio_buffer grew unbounded; now force-flushed at MAX_UTTERANCE_BYTES (vad.max_utterance_seconds, default 30s).

9. unheard questions. The poll marked a question announced before speaking it, so a failed/closed announce left the user answering something they never heard. The caller now marks it only after a successful announce → a failure retries next poll.

Test plan

tests/test_voice_reliability_l2.py — 10 tests: longest-match (+ strict bypass), enqueue-overflow-drops-oldest, runaway-force-flush, barge-in-captured vs new-start-suppressed, save idempotency, stream-error flag
Updated test_voice_ask_user::test_poll_skips_already_announced_question for the new mark-on-success contract
Existing 13 voice-pipeline + 29 ask-user tests pass (no regression)
python3.13 -m pytest --ignore=tests/test_skills.py -q → 2,065 passed, 77 skipped
ruff check: 0 issues
Re-ran test_voice_pipeline.py after every edit batch (VAD changes are the most behaviorally sensitive)

Branches off main; touches codec_voice.py + 2 voice tests — independent of the other queued PRs.

🤖 Generated with Claude Code

Nine findings from the codec_voice.py review. The voice pipeline is the live WebSocket loop, so each change is behavior-preserving on the happy path and guarded on the failure paths. 1. WebSocket send-after-close (CRITICAL) _safe_send_bytes / _safe_send_json + a self._ws_alive flag flipped False on disconnect (in _audio_receiver). _speak and the crew callback now no-op on a dead socket instead of raising — a client disconnect during a multi-minute crew no longer surfaces an exception out of the detached callback. 2. Fire-and-forget tasks (CRITICAL/HIGH) warmup_llm is now stored (self._warmup_task), gets an exception-logging done_callback, and is cancelled in run()'s finally (was orphaned → GC-able mid-flight + un-retrieved-future warnings). The two screen-overlay run_in_executor futures (orphaned + deprecated get_event_loop()) became a guarded _spawn_detached() — Popen is already non-blocking. 3. Double save_to_memory (HIGH) Both run()'s finally AND routes/websocket.py's finally call it → every voice turn was written to memory.db twice. Now idempotent via a _memory_saved flag. 4. _resolve_voice_option_choice longest-match (HIGH) The exact-substring loop returned the FIRST matching label, so "yes" beat "yes and notify" → mis-routed a non-strict multi-option answer. Now prefers the longest match. (Strict consent is unaffected — it bypasses this resolver.) 5. Echo-cooldown dropped barge-in audio (HIGH) feed_audio returned None unconditionally during the post-TTS cooldown, so a user already mid-utterance lost their leading words ("...end my email"). Now only a NEW speech start is suppressed; in-progress capture continues. 6. Unbounded audio_buffer (MEDIUM) Continuous mic noise above the VAD threshold kept last_speech_time fresh, so the silence gate never fired and the buffer grew without bound (~32 KB/s). Force-flush at MAX_UTTERANCE_BYTES (config: vad.max_utterance_seconds, 30s). 7. utterance_queue head-of-line block (MEDIUM) await queue.put() on the maxsize=3 queue blocked the receiver while the pipeline was slow — stalling interrupt/ping control frames. New _enqueue_utterance() is non-blocking and drops the oldest on overflow. 8. _stream_qwen error sentinel pollution (MEDIUM) "Sorry, I had a processing error." was appended to self.messages + saved to memory, so the LLM "saw" a fake apology it never reasoned. A _stream_error flag now skips persisting it (the user still HEARS it). 9. _announced_question_ids premature mark (HIGH) The poll marked a question announced BEFORE the announce ran; a failed/closed announce left the user answering an unheard question. Now the caller marks it only after a successful announce, so a failure retries next poll. Test surface: tests/test_voice_reliability_l2.py (10 tests) + updated test_voice_ask_user::test_poll_skips_already_announced_question for the new mark-on-success contract. Existing 13 voice-pipeline + 29 ask-user tests pass. Full suite: 2,065 passed / 77 skipped. ruff clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

AVADSA25 merged commit daa159d into main May 31, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(voice): L2 — VoicePipeline reliability (WebSocket guard, task lifecycle, double-save, +6)#172

fix(voice): L2 — VoicePipeline reliability (WebSocket guard, task lifecycle, double-save, +6)#172
AVADSA25 merged 1 commit into
mainfrom
voice-reliability-fixes

AVADSA25 commented May 31, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AVADSA25 commented May 31, 2026

Summary

Highlights

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants