Replies: 2 comments 1 reply
-
The way KoboldCpp currently processes audio is not suitable for constant real time transcription. Besides having a cache, you would need to maintain context between audio inputs for generation and have an overlapping window and ideally have it streamed instead of processed in chunks. You could setup a system that constantly records and looks for pauses and then sends the that chunk to KoboldCpp while still recording, but you would still have to wait for each transcription to finish before processing the next chunk and would always have a considerable lag between the two. If people kept talking the whole time it would be significantly delayed by the end of an decently long duration. If you just want a transcription of the meeting you could record it first and then transcribe it, but KoboldCpp as it is right now is not meant for live transcriptions with different people talking at the same time. Perhaps this is a feature that is possible, but as it is right now it is not ideal for that use. |
Beta Was this translation helpful? Give feedback.
-
Yeah, right now interrupting the AI is not supported too, so you need to wait for the previous request to complete first. You could potentially write a third party UI that handles buffering but it would not work that well. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Is it possible to make a buffer for recording voice from a microphone in the voice detection mode, so that the recording is carried out constantly, even during the process of voice recognition or LLM "thinks" (the button is marked as BUSY or spins)? I connected the quantized model of Whisper-3-large and Gemma9b. It understands speech very well, but the moment it decides that the phrase is over, the voice recording is interrupted for some time. I wanted to try using this combination as a simultaneous interpreter at meetings with several languages. Everything seems to work well, phrases are well recognized and automatically translated by LLM, but parts of the speech fall out while LLM writes the translation or the previous phrase is recognized. As I understand it, this is not a mandatory condition, because the voice recording can work in parallel, and not sequentially with the models, right? And when the models are free, transfer the already recorded phrase from the buffer to them again.
Beta Was this translation helpful? Give feedback.
All reactions