fix: send remaining sentences immediately when end_input() is called #4440
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes a bug in
StreamPacerWrapperwhere callingend_input()did not immediately send remaining buffered sentences to TTS, causing multi-second delays in agent responses.The Bug
When
end_input()is called (indicating the user has finished speaking), the pacer continued to wait based on theremaining_audiotimer calculation instead of immediately sending all remaining text:end_input()only woke the send task conditionally - it only called_wakeup_event.set()when the audio emitter's destination channel was closed, not when it was still openExample of the Problem
With
min_remaining_audio = 5.0s:end_input()called while audio emitter is still open_input_ended = True, but no wakeup occursremaining_audio - min_remaining_audio = 10 - 5 = 5sResult: ~5 second delay after user finishes speaking before remaining sentences are synthesized.
Changes
end_input()- moved_wakeup_event.set()outside the conditional(self._input_ended and self._sentences)triggers immediate sendingWhy this is correct
The purpose of pacing is to:
max_text_lengthbatchingOnce input has ended, we know exactly what text needs to be synthesized and there's no benefit to delaying. The
max_text_lengthbatching is still respected, so we're not bypassing quality optimizations - just the waiting.Test plan
end_input()is called with pending sentences, they are sent immediately (within ~1 event loop iteration)max_text_lengthbatching is still respected when input ends🤖 Generated with Claude Code