There are some parameters that we currently expose to the user through the advanced transcription settings that should rather be automatically determined during runner startup, maybe while including a preference from the admin hosting the runner. An example for this is the beam size parameter: A higher beam size significantly increases hardware utilization but can also lead to improved transcription quality.
Also there are some other optimizations that can be done to make job processing more efficient. Proposal:
Add a burn-in phase that runs during runner startup that determines the following (either through try-and-error or through some estimation):
- What is the optimal set of parameters for the transcription (i.e. beam size, ...) for the given hardware? How large can we set the beam size without exhausting available VRAM? Ideally we want to maximize beam size to improve transcription quality, but maybe this can also be adjusted through the runner config file by the admin so that they can balance hardware utilization and transcription quality for themselves a bit
- How many models can fit in VRAM at once? Can we maybe keep the whisper model, the diarization model, and the most commonly used alignment models in VRAM at all times to avoid the overhead that comes from constantly loading and unloading the models as we do currently?
There are some parameters that we currently expose to the user through the advanced transcription settings that should rather be automatically determined during runner startup, maybe while including a preference from the admin hosting the runner. An example for this is the beam size parameter: A higher beam size significantly increases hardware utilization but can also lead to improved transcription quality.
Also there are some other optimizations that can be done to make job processing more efficient. Proposal:
Add a burn-in phase that runs during runner startup that determines the following (either through try-and-error or through some estimation):