-
Notifications
You must be signed in to change notification settings - Fork 219
Description
I have observed that the current implementation of the AI voice agent, which uses OpenAI, Deepgram, and Twilio, experiences a delay of 4-5 seconds before responding when a call begins. This is despite using the stream = true feature. It appears that the response is delayed until the stream is completed.
In the current implementation, there is a loop that buffers the audio:
while(Object.prototype.hasOwnProperty.call(this.audioBuffer, this.expectedAudioIndex)) { const bufferedAudio = this.audioBuffer[this.expectedAudioIndex]; this.sendAudio(bufferedAudio); this.expectedAudioIndex++; } } else { this.audioBuffer[index] = audio; }
I believe this delay can be reduced by utilizing WebSocket within the while loop to stream the audio in chunks, rather than waiting for the entire stream to connect.
By implementing WebSocket for chunk-by-chunk streaming, the AI voice agent can respond more promptly, significantly enhancing the user experience.
Please let me know if I make sense or there is any reason for you to handle the stream in this way?