Skip to content

fix(voice): L2 — VoicePipeline reliability (WebSocket guard, task lifecycle, double-save, +6)#172

Merged
AVADSA25 merged 1 commit into
mainfrom
voice-reliability-fixes
May 31, 2026
Merged

fix(voice): L2 — VoicePipeline reliability (WebSocket guard, task lifecycle, double-save, +6)#172
AVADSA25 merged 1 commit into
mainfrom
voice-reliability-fixes

Conversation

@AVADSA25
Copy link
Copy Markdown
Owner

Summary

Nine findings from a focused review of codec_voice.py (the live WebSocket voice loop). Each is behavior-preserving on the happy path and guarded on the failure paths. The user explicitly signed off on fixing the deep voice-async findings.

# Severity Fix
1 Critical WebSocket send-after-close → _safe_send_* + _ws_alive flag
2 Critical Fire-and-forget tasks orphaned → stored + cancelled + exception-logged
3 High Double save_to_memory (run + route finally) → idempotent
4 High _resolve_voice_option_choice first-match → longest-match
5 High Echo-cooldown dropped barge-in audio → only suppress new starts
6 Medium Unbounded audio_buffer under noise → force-flush cap
7 Medium utterance_queue head-of-line block → non-blocking enqueue
8 Medium _stream_qwen error sentinel persisted to memory → flagged + skipped
9 High _announced_question_ids marked before announce → mark on success only

Highlights

1. send-after-close. _speak and the crew progress callback sent on the raw socket; a disconnect mid-TTS raised — and in the crew callback that exception surfaced out of a detached multi-minute crew. New _safe_send_bytes/_safe_send_json no-op once _ws_alive flips False (set in _audio_receiver's disconnect branch).

2. task lifecycle. warmup_llm was create_task'd with no ref (GC-able mid-flight) and no exception retrieval; the screen-overlay used two orphaned run_in_executor futures + deprecated get_event_loop(). Warmup is now stored, exception-logged, and cancelled in run()'s finally; the overlay uses a guarded _spawn_detached() (Popen is already non-blocking).

5 & 6 (VAD). The echo-cooldown returned None unconditionally, so a user already mid-utterance lost their leading words; now only a new speech start is suppressed during cooldown. Continuous mic noise kept the silence gate from firing → audio_buffer grew unbounded; now force-flushed at MAX_UTTERANCE_BYTES (vad.max_utterance_seconds, default 30s).

9. unheard questions. The poll marked a question announced before speaking it, so a failed/closed announce left the user answering something they never heard. The caller now marks it only after a successful announce → a failure retries next poll.

Test plan

  • tests/test_voice_reliability_l2.py — 10 tests: longest-match (+ strict bypass), enqueue-overflow-drops-oldest, runaway-force-flush, barge-in-captured vs new-start-suppressed, save idempotency, stream-error flag
  • Updated test_voice_ask_user::test_poll_skips_already_announced_question for the new mark-on-success contract
  • Existing 13 voice-pipeline + 29 ask-user tests pass (no regression)
  • python3.13 -m pytest --ignore=tests/test_skills.py -q2,065 passed, 77 skipped
  • ruff check: 0 issues
  • Re-ran test_voice_pipeline.py after every edit batch (VAD changes are the most behaviorally sensitive)

Branches off main; touches codec_voice.py + 2 voice tests — independent of the other queued PRs.

🤖 Generated with Claude Code

Nine findings from the codec_voice.py review. The voice pipeline is the live
WebSocket loop, so each change is behavior-preserving on the happy path and
guarded on the failure paths.

1. WebSocket send-after-close (CRITICAL)
   _safe_send_bytes / _safe_send_json + a self._ws_alive flag flipped False on
   disconnect (in _audio_receiver). _speak and the crew callback now no-op on a
   dead socket instead of raising — a client disconnect during a multi-minute
   crew no longer surfaces an exception out of the detached callback.

2. Fire-and-forget tasks (CRITICAL/HIGH)
   warmup_llm is now stored (self._warmup_task), gets an exception-logging
   done_callback, and is cancelled in run()'s finally (was orphaned → GC-able
   mid-flight + un-retrieved-future warnings). The two screen-overlay
   run_in_executor futures (orphaned + deprecated get_event_loop()) became a
   guarded _spawn_detached() — Popen is already non-blocking.

3. Double save_to_memory (HIGH)
   Both run()'s finally AND routes/websocket.py's finally call it → every voice
   turn was written to memory.db twice. Now idempotent via a _memory_saved flag.

4. _resolve_voice_option_choice longest-match (HIGH)
   The exact-substring loop returned the FIRST matching label, so "yes" beat
   "yes and notify" → mis-routed a non-strict multi-option answer. Now prefers
   the longest match. (Strict consent is unaffected — it bypasses this resolver.)

5. Echo-cooldown dropped barge-in audio (HIGH)
   feed_audio returned None unconditionally during the post-TTS cooldown, so a
   user already mid-utterance lost their leading words ("...end my email"). Now
   only a NEW speech start is suppressed; in-progress capture continues.

6. Unbounded audio_buffer (MEDIUM)
   Continuous mic noise above the VAD threshold kept last_speech_time fresh, so
   the silence gate never fired and the buffer grew without bound (~32 KB/s).
   Force-flush at MAX_UTTERANCE_BYTES (config: vad.max_utterance_seconds, 30s).

7. utterance_queue head-of-line block (MEDIUM)
   await queue.put() on the maxsize=3 queue blocked the receiver while the
   pipeline was slow — stalling interrupt/ping control frames. New
   _enqueue_utterance() is non-blocking and drops the oldest on overflow.

8. _stream_qwen error sentinel pollution (MEDIUM)
   "Sorry, I had a processing error." was appended to self.messages + saved to
   memory, so the LLM "saw" a fake apology it never reasoned. A _stream_error
   flag now skips persisting it (the user still HEARS it).

9. _announced_question_ids premature mark (HIGH)
   The poll marked a question announced BEFORE the announce ran; a failed/closed
   announce left the user answering an unheard question. Now the caller marks it
   only after a successful announce, so a failure retries next poll.

Test surface: tests/test_voice_reliability_l2.py (10 tests) + updated
test_voice_ask_user::test_poll_skips_already_announced_question for the new
mark-on-success contract. Existing 13 voice-pipeline + 29 ask-user tests pass.
Full suite: 2,065 passed / 77 skipped. ruff clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@AVADSA25 AVADSA25 merged commit daa159d into main May 31, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants