Skip to content

adding streaming/data pipeline/simultaneous processing of several transcription steps to speed up processing #3429

@alosslessdev

Description

@alosslessdev

From what i can see, whisper.cpp has several steps which are carried out one at a time before proceeding to the next: conversion with ffmpeg, vad detection, and then transcription, and probably other things. While one step is carried out, the other resources of the machine are idle such as GPUs or storage I/O. Also during conversion RAM usage can be at least as high as the size of the converted audio file. Would it be possible to carry out these steps simultaneously, with data streaming from one step to the next as a data pipeline? data would flow from ffmpeg to the vad model and then the vad model would chunk the audio and pass the chunks as they are created to the GPU so the GPU or other parts would still be idle at times. I assume this would also reduce peak RAM consumption considerably if the audio is several hours long allowing for more simultaneous whisper.cpp processes in a single machine and improving machine utilization

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions