Skip to content

Add "burn-in phase" to runner startup #109

@JulianFP

Description

@JulianFP

There are some parameters that we currently expose to the user through the advanced transcription settings that should rather be automatically determined during runner startup, maybe while including a preference from the admin hosting the runner. An example for this is the beam size parameter: A higher beam size significantly increases hardware utilization but can also lead to improved transcription quality.
Also there are some other optimizations that can be done to make job processing more efficient. Proposal:

Add a burn-in phase that runs during runner startup that determines the following (either through try-and-error or through some estimation):

  • What is the optimal set of parameters for the transcription (i.e. beam size, ...) for the given hardware? How large can we set the beam size without exhausting available VRAM? Ideally we want to maximize beam size to improve transcription quality, but maybe this can also be adjusted through the runner config file by the admin so that they can balance hardware utilization and transcription quality for themselves a bit
  • How many models can fit in VRAM at once? Can we maybe keep the whisper model, the diarization model, and the most commonly used alignment models in VRAM at all times to avoid the overhead that comes from constantly loading and unloading the models as we do currently?

Metadata

Metadata

Assignees

No one assigned

    Labels

    medium prioritywell be done at some pointrunnerTouches runner functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions