Fine-tune training audio quality #2074

marceltud · 2024-03-09T14:41:03Z

marceltud
Mar 9, 2024

Hi,

I have transcribed some movies using whisper and there are many mistakes.
I would like to fine-tune the model for my language. My idea was to correct the wrong transcriptions and then to train the model based on these corrections.
Since the transcriptions are from movies, do I need to isolate the noisy audio, basically to remove it?
I would prefer not to remove it since this will require a lot of work, to cut it and to remove it and then to do the same for the subtitles with the transcribed text...

For the fine-tunning process I tried it once and after a lot of work I managed to make it work, so I should be able to repeat it. This question is not about how to do the fine-tunning, but about the quality of the audio file.

I've heard others saying that it must be only pure voice like recorded in a studio. I guess that this is not really needed, as the pre-trained models are also trained on some noisy data, I guess, I'm not sure of it.

Very rarelythere is also some background music. Is there a risk that the model would learn something bad, like get confused?

Thank you

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tune training audio quality #2074

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Fine-tune training audio quality #2074

Uh oh!

marceltud Mar 9, 2024

Replies: 0 comments

marceltud
Mar 9, 2024