Skip to content

use audio context hook for InterruptibleTTSService#4099

Open
omChauhanDev wants to merge 2 commits intopipecat-ai:mainfrom
omChauhanDev:fix/interruptible-tts-bot-speaking-race
Open

use audio context hook for InterruptibleTTSService#4099
omChauhanDev wants to merge 2 commits intopipecat-ai:mainfrom
omChauhanDev:fix/interruptible-tts-bot-speaking-race

Conversation

@omChauhanDev
Copy link
Contributor

@omChauhanDev omChauhanDev commented Mar 21, 2026

Please describe the changes in your PR. If it is addressing an issue, please reference that as well.

Fixes #3986

Issue :

The _bot_speaking guard in InterruptibleTTSService._handle_interruption() skips websocket disconnect/reconnect when BotStartedSpeakingFrame hasn't reached the TTS processor yet. If a user interrupts while audio is still being synthesized or in-transit, the TTS server keeps streaming stale audio into the next response.

Note: #4090 partially addressed this by routing audio through append_to_audio_context(), so stale audio is discarded when no active context exists. However, the server still continues synthesizing unused audio (wasted cost/bandwidth), and old audio can leak into the next response once a new audio context becomes active.

Approach :

Replaced the _bot_speaking guard with an on_audio_context_interrupted() override - the same hook ElevenLabs, Rime, & Deepgram already use. Audio contexts exist from synthesis start to playback end, so this fires exactly when needed & stays silent when the bot is idle (preserving the original optimization against unnecessary reconnects from VAD noise).

Changes :

  • tts_service.py: removed _bot_speaking, _handle_interruption, process_frame override; added on_audio_context_interrupted with disconnect/reconnect
  • fish/tts.py: added super() call in existing on_audio_context_interrupted override

@omChauhanDev omChauhanDev changed the title fix: use audio context hook for InterruptibleTTSService use audio context hook for InterruptibleTTSService Mar 21, 2026
@codecov
Copy link

codecov bot commented Mar 21, 2026

Codecov Report

❌ Patch coverage is 25.00000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/pipecat/services/tts_service.py 33.33% 2 Missing ⚠️
src/pipecat/services/fish/tts.py 0.00% 1 Missing ⚠️
Files with missing lines Coverage Δ
src/pipecat/services/fish/tts.py 7.69% <0.00%> (-0.04%) ⬇️
src/pipecat/services/tts_service.py 66.43% <33.33%> (+0.56%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

yuki901 added a commit to yuki901/pipecat that referenced this pull request Mar 22, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
@yuki901
Copy link
Contributor

yuki901 commented Mar 22, 2026

Thank you very much!!
Btw, since #4090, all InterruptibleTTSService subclasses route audio through append_to_audio_context(). After an interruption, _handle_interruption replaces _audio_contexts with a fresh empty dict, so stale audio from _receive_messages() is silently discarded — the user-facing bug in #3986 seems already fixed regardless of the _bot_speaking guard.

This PR's fix would still prevent the server from continuing to synthesize and stream unused audio (cost/bandwidth), but is that the intended scope? Does the PR description account for #4090?

yuki901 added a commit to yuki901/pipecat that referenced this pull request Mar 22, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
@omChauhanDev
Copy link
Contributor Author

omChauhanDev commented Mar 22, 2026

Hey @yuki901, nice catch - you're right that #4090 handles the immediate window. After interruption, _create_audio_context_task() replaces _audio_contexts with a fresh dict and _playing_context_id is reset to None, so stale audio from _receive_messages() hits append_to_audio_context(None, ...) and gets silently dropped. The audio-plays-right-after-interruption symptom is largely gone.

That said, this PR still covers two things #4090 doesn't:

  1. Server-side waste - without disconnecting, the TTS server keeps synthesizing and streaming audio nobody will use. That's wasted compute, bandwidth, and API cost.

  2. Audio crossover into the next response - there's a subtler window where old audio can leak. Once the next LLM response starts and a new audio context is created, _playing_context_id gets set to the new context ID. If old audio from the still-connected server arrives at that point, get_active_audio_context_id() returns the new ID, and append_to_audio_context routes old audio into the new context. The disconnect/reconnect prevents this by clearing server-side state entirely.

Happy to update the PR description to reference #4090 & clarify the scope. Thanks for flagging it!

yuki901 added a commit to yuki901/pipecat that referenced this pull request Mar 22, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
yuki901 added a commit to yuki901/pipecat that referenced this pull request Mar 23, 2026
Replace the _bot_speaking guard with on_audio_context_interrupted() override
so the websocket is always reconnected when audio is in-transit, fixing the
race condition where interruptions during the BotStartedSpeakingFrame
round-trip window would leave stale audio streaming.

Fish Audio TTS now calls super().on_audio_context_interrupted() to trigger
the reconnect before stopping metrics.

Fixes pipecat-ai#3986 (based on PR pipecat-ai#4099)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

InterruptibleTTSService: _bot_speaking guard causes interruption to fail when TTS audio hasn't reached output transport yet

2 participants