buffer in the voice detection mode #1405

dddimish · 2025-03-03T21:14:04Z

dddimish
Mar 3, 2025

Is it possible to make a buffer for recording voice from a microphone in the voice detection mode, so that the recording is carried out constantly, even during the process of voice recognition or LLM "thinks" (the button is marked as BUSY or spins)? I connected the quantized model of Whisper-3-large and Gemma9b. It understands speech very well, but the moment it decides that the phrase is over, the voice recording is interrupted for some time. I wanted to try using this combination as a simultaneous interpreter at meetings with several languages. Everything seems to work well, phrases are well recognized and automatically translated by LLM, but parts of the speech fall out while LLM writes the translation or the previous phrase is recognized. As I understand it, this is not a mandatory condition, because the voice recording can work in parallel, and not sequentially with the models, right? And when the models are free, transfer the already recorded phrase from the buffer to them again.

jabberjabberjabber · 2025-03-04T09:10:08Z

jabberjabberjabber
Mar 4, 2025

The way KoboldCpp currently processes audio is not suitable for constant real time transcription. Besides having a cache, you would need to maintain context between audio inputs for generation and have an overlapping window and ideally have it streamed instead of processed in chunks. You could setup a system that constantly records and looks for pauses and then sends the that chunk to KoboldCpp while still recording, but you would still have to wait for each transcription to finish before processing the next chunk and would always have a considerable lag between the two. If people kept talking the whole time it would be significantly delayed by the end of an decently long duration.

If you just want a transcription of the meeting you could record it first and then transcribe it, but KoboldCpp as it is right now is not meant for live transcriptions with different people talking at the same time. Perhaps this is a feature that is possible, but as it is right now it is not ideal for that use.

1 reply

dddimish Mar 4, 2025
Author

Thanks for the clarification. No, I meant real-time simultaneous translation. It is not difficult to transcribe and translate a recording of a meeting.

LostRuins · 2025-03-04T10:48:08Z

LostRuins
Mar 4, 2025
Maintainer

Yeah, right now interrupting the AI is not supported too, so you need to wait for the previous request to complete first. You could potentially write a third party UI that handles buffering but it would not work that well.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

buffer in the voice detection mode #1405

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

buffer in the voice detection mode #1405

Uh oh!

dddimish Mar 3, 2025

Replies: 2 comments · 1 reply

Uh oh!

jabberjabberjabber Mar 4, 2025

Uh oh!

dddimish Mar 4, 2025 Author

Uh oh!

LostRuins Mar 4, 2025 Maintainer

dddimish
Mar 3, 2025

Replies: 2 comments 1 reply

jabberjabberjabber
Mar 4, 2025

dddimish Mar 4, 2025
Author

LostRuins
Mar 4, 2025
Maintainer