Skip to content

Bug Report: Transcript Out of Order in Case of User Interruption [ElevenLabs] #1361

@rahultyl

Description

@rahultyl

Bug Report: Transcript Out of Order in Case of User Interruption [ElevenLabs]

Issue Summary

We have observed that the transcript is stored out of order whenever the voice bot face multiple simultaneous interruptions.

Observed Behavior

Due to multiple interruptions, multiple LLM generations occur. While some word timestamps are queued for processing, the new llm response generation resets the cumulative time. As a result, the calculated timestamps for words in the current text sometimes become lower than previous timestamps, leading to the incorrect positioning of words in the transcript.

Example

Stored Transcript (Incorrect Order):
I can help you schedu le an appointment Hi! I'm here to help. in either Let's schedule your a ppointment
Expected Transcript (Correct Order):
I can help you schedule an appointment in either the morning or evening.
Do you have a preference for one over the other?
Hi!
I'm here to help.

Word Timestamps Log (Example Data)

2025-03-12 12:24:23.887 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='I' frame.pts=97234192417 self._initial_word_timestamp=96828192417 timestamp=406000000
2025-03-12 12:24:23.887 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='can' frame.pts=97466192417 self._initial_word_timestamp=96828192417 timestamp=638000000
2025-03-12 12:24:23.887 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='help' frame.pts=97687192417 self._initial_word_timestamp=96828192417 timestamp=859000000
2025-03-12 12:24:23.887 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='you' frame.pts=97826192417 self._initial_word_timestamp=96828192417 timestamp=998000000
2025-03-12 12:24:23.887 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='schedu' frame.pts=98105192417 self._initial_word_timestamp=96828192417 timestamp=1277000000

2025-03-12 12:24:24.015 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='le' frame.pts=98151192417 self._initial_word_timestamp=96828192417 timestamp=1323000000
2025-03-12 12:24:24.015 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='an' frame.pts=98256192417 self._initial_word_timestamp=96828192417 timestamp=1428000000
2025-03-12 12:24:24.015 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='appointment' frame.pts=98825192417 self._initial_word_timestamp=96828192417 timestamp=1997000000
2025-03-12 12:24:24.016 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='in' frame.pts=98976192417 self._initial_word_timestamp=96828192417 timestamp=2148000000
2025-03-12 12:24:24.016 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='either' frame.pts=99266192417 self._initial_word_timestamp=96828192417 timestamp=2438000000
2025-03-12 12:24:24.016 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='the' frame.pts=99405192417 self._initial_word_timestamp=96828192417 timestamp=2577000000
2025-03-12 12:24:24.016 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='morning' frame.pts=99754192417 self._initial_word_timestamp=96828192417 timestamp=2926000000
2025-03-12 12:24:24.016 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='or' frame.pts=99870192417 self._initial_word_timestamp=96828192417 timestamp=3042000000
2025-03-12 12:24:24.016 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='evening.' frame.pts=100323192417 self._initial_word_timestamp=96828192417 timestamp=3495000000

Here the cumulative time is restted to 0 and hence ‘Hi’ word frams.pts become less than word ‘in’ frams.pts, due to which it results in transcript out of order

2025-03-12 12:24:25.604 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='Hi!' frame.pts=97269192417 self._initial_word_timestamp=96828192417 timestamp=441000000


2025-03-12 12:24:25.861 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='I'm' frame.pts=97420192417 self._initial_word_timestamp=96828192417 timestamp=592000000
2025-03-12 12:24:25.861 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='here' frame.pts=97641192417 self._initial_word_timestamp=96828192417 timestamp=813000000
2025-03-12 12:24:25.861 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='to' frame.pts=97745192417 self._initial_word_timestamp=96828192417 timestamp=917000000
2025-03-12 12:24:25.861 | INFO     | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='help.' frame.pts=98047192417 self._initial_word_timestamp=96828192417 timestamp=1219000000

Expected Behavior

  • The transcript should be stored in the correct order, ensuring that word timestamps should be maintained.

Steps to Reproduce

  1. Initiate a conversation with the voice bot.
  2. Interrupt the bot multiple times in quick succession.
  3. Observe the stored transcript and compare it with the expected order.

Additional Context

  • This issue is occurring specifically when handling multiple user interruptions.
  • The problem is observed with ElevenLabs TTS processing.

Please find attached the detailed logs and transcript-

test_3.log
Recording link -link
transcript.txt

Sample code gist -link

cc: @Vaibhav159 @tarungarg546

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions