Given audio files with variable duration, is it better to stick with one big model? #1884

T145 · 2023-12-08T18:19:42Z

T145
Dec 8, 2023

At present I have a collection of audio clips ranging from a couple seconds to half a minute to a few minutes. To process them I get the file duration and based on that information determine what I'd think is an appropriate and respective model. As an example, "tiny.en" on clips that are a couple seconds and "large" on the ones that are a few minutes. If it's better to just use the "upper bound" then I'm happy to stick to it. I'm also using a prompt to fix grammar and proper nouns throughout the transcripts.

Answered by glangford

Dec 8, 2023

If you have the horsepower, I wouldn't consider using anything below a medium model regardless of audio length.

You can compare tiny vs large in a direct comparison on the same file(s). In my experience, the quality difference has been substantial.

View full answer

glangford · 2023-12-08T19:01:18Z

glangford
Dec 8, 2023

If you have the horsepower, I wouldn't consider using anything below a medium model regardless of audio length.

You can compare tiny vs large in a direct comparison on the same file(s). In my experience, the quality difference has been substantial.

2 replies

T145 Dec 8, 2023
Author

I'll stick w/ large then! Are there any configurations you'd recommend to apply? I know it's not configured specifically for EN so that's the only thing that gives me pause to using it solely.

glangford Dec 8, 2023

It's worth testing for yourself against the specific content you have. Initially I would recommend using large-v2, although large-v3 was just released it has some hallucination and punctuation issues (anecdotal) which you can read about elsewhere.

The other benefit of using a single model is that you can load it once in Python and then use it across multiple files without reloading (the load and setup time is long if you were doing just one short audio).

If you are creating video subtitles, word_timestamps set to True is really good in combination with output_format srt (or json).

Other than that, there is a rich history of past discussions you can refer to if you run into an issue, or if you have questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Given audio files with variable duration, is it better to stick with one big model? #1884

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Given audio files with variable duration, is it better to stick with one big model? #1884

Uh oh!

T145 Dec 8, 2023

Replies: 1 comment · 2 replies

Uh oh!

glangford Dec 8, 2023

Uh oh!

T145 Dec 8, 2023 Author

Uh oh!

glangford Dec 8, 2023

T145
Dec 8, 2023

Replies: 1 comment 2 replies

glangford
Dec 8, 2023

T145 Dec 8, 2023
Author