Skip to content

fix: keep websocket TTS word timestamps inside their turn#4102

Open
namanbansal013 wants to merge 2 commits intopipecat-ai:mainfrom
namanbansal013:fix/websocket-tts-stale-word-timestamp-race
Open

fix: keep websocket TTS word timestamps inside their turn#4102
namanbansal013 wants to merge 2 commits intopipecat-ai:mainfrom
namanbansal013:fix/websocket-tts-stale-word-timestamp-race

Conversation

@namanbansal013
Copy link

Fixes #1361
Related to #1355

Summary

  • clear cached pre-audio word timestamps on interruption so interrupted websocket TTS turns cannot flush stale words into the next context
  • drop late word timestamp events for stale audio contexts instead of routing them through the direct timestamp path
  • clamp backward PTS values as a safety net and add websocket TTS regressions for cached-word, late-word, and non-monotonic timestamp races

Repro

The downstream repro that motivated this fix interrupted a Cartesia response immediately after playback began:

603.088s CartesiaTTSService#0: Generating TTS [Yes, each element represents the maximum jump length, so you can take any number of steps up to that value.]
603.114s Bot started speaking
603.267s user interruption begins
TTSTextFrame(\"Yes,\").pts => 0.092880s

That single stale word was later flushed into the next turn and reordered transcript output. The same interruption-driven transcript ordering bug is already described upstream in #1361 and #1355, just with ElevenLabs instead of Cartesia.

Test plan

  • uv run ruff check .
  • uv run pytest tests/test_tts_word_timestamp_interruptions.py tests/test_tts_frame_ordering.py
  • uv run pytest
    • Result: 984 passed, 88 skipped, 592 warnings in 104.34s

Naman Bansal added 2 commits March 22, 2026 15:49
Drop stale websocket TTS word timestamp events after interruptions and clamp any backward PTS values so interrupted speech cannot leak into later turns or reorder transcripts.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug Report: Transcript Out of Order in Case of User Interruption [ElevenLabs]

1 participant