adding streaming/data pipeline/simultaneous processing of several transcription steps to speed up processing

From what i can see, whisper.cpp has several steps which are carried out one at a time before proceeding to the next: conversion with ffmpeg, vad detection, and then transcription, and probably other things. While one step is carried out, the other resources of the machine are idle such as GPUs or storage I/O. Also during conversion RAM usage can be at least as high as the size of the converted audio file. Would it be possible to carry out these steps simultaneously, with data streaming from one step to the next as a data pipeline? data would flow from ffmpeg to the vad model and then the vad model would chunk the audio and pass the chunks as they are created to the GPU so the GPU or other parts would still be idle at times. I assume this would also reduce  peak RAM consumption considerably if the audio is several hours long allowing for more simultaneous whisper.cpp processes in a single machine and improving machine utilization

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

adding streaming/data pipeline/simultaneous processing of several transcription steps to speed up processing #3429

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

adding streaming/data pipeline/simultaneous processing of several transcription steps to speed up processing #3429

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions