-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Description
Bug Report: Transcript Out of Order in Case of User Interruption [ElevenLabs]
Issue Summary
We have observed that the transcript is stored out of order whenever the voice bot face multiple simultaneous interruptions.
Observed Behavior
Due to multiple interruptions, multiple LLM generations occur. While some word timestamps are queued for processing, the new llm response generation resets the cumulative time. As a result, the calculated timestamps for words in the current text sometimes become lower than previous timestamps, leading to the incorrect positioning of words in the transcript.
Example
Stored Transcript (Incorrect Order):
I can help you schedu le an appointment Hi! I'm here to help. in either Let's schedule your a ppointment
Expected Transcript (Correct Order):
I can help you schedule an appointment in either the morning or evening.
Do you have a preference for one over the other?
Hi!
I'm here to help.
Word Timestamps Log (Example Data)
2025-03-12 12:24:23.887 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='I' frame.pts=97234192417 self._initial_word_timestamp=96828192417 timestamp=406000000
2025-03-12 12:24:23.887 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='can' frame.pts=97466192417 self._initial_word_timestamp=96828192417 timestamp=638000000
2025-03-12 12:24:23.887 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='help' frame.pts=97687192417 self._initial_word_timestamp=96828192417 timestamp=859000000
2025-03-12 12:24:23.887 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='you' frame.pts=97826192417 self._initial_word_timestamp=96828192417 timestamp=998000000
2025-03-12 12:24:23.887 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='schedu' frame.pts=98105192417 self._initial_word_timestamp=96828192417 timestamp=1277000000
2025-03-12 12:24:24.015 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='le' frame.pts=98151192417 self._initial_word_timestamp=96828192417 timestamp=1323000000
2025-03-12 12:24:24.015 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='an' frame.pts=98256192417 self._initial_word_timestamp=96828192417 timestamp=1428000000
2025-03-12 12:24:24.015 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='appointment' frame.pts=98825192417 self._initial_word_timestamp=96828192417 timestamp=1997000000
2025-03-12 12:24:24.016 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='in' frame.pts=98976192417 self._initial_word_timestamp=96828192417 timestamp=2148000000
2025-03-12 12:24:24.016 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='either' frame.pts=99266192417 self._initial_word_timestamp=96828192417 timestamp=2438000000
2025-03-12 12:24:24.016 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='the' frame.pts=99405192417 self._initial_word_timestamp=96828192417 timestamp=2577000000
2025-03-12 12:24:24.016 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='morning' frame.pts=99754192417 self._initial_word_timestamp=96828192417 timestamp=2926000000
2025-03-12 12:24:24.016 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='or' frame.pts=99870192417 self._initial_word_timestamp=96828192417 timestamp=3042000000
2025-03-12 12:24:24.016 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='evening.' frame.pts=100323192417 self._initial_word_timestamp=96828192417 timestamp=3495000000
Here the cumulative time is restted to 0 and hence ‘Hi’ word frams.pts become less than word ‘in’ frams.pts, due to which it results in transcript out of order
2025-03-12 12:24:25.604 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='Hi!' frame.pts=97269192417 self._initial_word_timestamp=96828192417 timestamp=441000000
2025-03-12 12:24:25.861 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='I'm' frame.pts=97420192417 self._initial_word_timestamp=96828192417 timestamp=592000000
2025-03-12 12:24:25.861 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='here' frame.pts=97641192417 self._initial_word_timestamp=96828192417 timestamp=813000000
2025-03-12 12:24:25.861 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='to' frame.pts=97745192417 self._initial_word_timestamp=96828192417 timestamp=917000000
2025-03-12 12:24:25.861 | INFO | pipecat.services.ai_services:_words_task_handler:468 - Word Timestamps: word='help.' frame.pts=98047192417 self._initial_word_timestamp=96828192417 timestamp=1219000000
Expected Behavior
- The transcript should be stored in the correct order, ensuring that word timestamps should be maintained.
Steps to Reproduce
- Initiate a conversation with the voice bot.
- Interrupt the bot multiple times in quick succession.
- Observe the stored transcript and compare it with the expected order.
Additional Context
- This issue is occurring specifically when handling multiple user interruptions.
- The problem is observed with ElevenLabs TTS processing.
Please find attached the detailed logs and transcript-
test_3.log
Recording link -link
transcript.txt
Sample code gist -link