You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From what i can see, whisper.cpp has several steps which are carried out one at a time before proceeding to the next: conversion with ffmpeg, vad detection, and then transcription, and probably other things. While one step is carried out, the other resources of the machine are idle such as GPUs or storage I/O. Also during conversion RAM usage can be at least as high as the size of the converted audio file. Would it be possible to carry out these steps simultaneously, with data streaming from one step to the next as a data pipeline? data would flow from ffmpeg to the vad model and then the vad model would chunk the audio and pass the chunks as they are created to the GPU so the GPU or other parts would still be idle at times. I assume this would also reduce peak RAM consumption considerably if the audio is several hours long allowing for more simultaneous whisper.cpp processes in a single machine and improving machine utilization