You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have transcribed some movies using whisper and there are many mistakes.
I would like to fine-tune the model for my language. My idea was to correct the wrong transcriptions and then to train the model based on these corrections.
Since the transcriptions are from movies, do I need to isolate the noisy audio, basically to remove it?
I would prefer not to remove it since this will require a lot of work, to cut it and to remove it and then to do the same for the subtitles with the transcribed text...
For the fine-tunning process I tried it once and after a lot of work I managed to make it work, so I should be able to repeat it. This question is not about how to do the fine-tunning, but about the quality of the audio file.
I've heard others saying that it must be only pure voice like recorded in a studio. I guess that this is not really needed, as the pre-trained models are also trained on some noisy data, I guess, I'm not sure of it.
Very rarelythere is also some background music. Is there a risk that the model would learn something bad, like get confused?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have transcribed some movies using whisper and there are many mistakes.
I would like to fine-tune the model for my language. My idea was to correct the wrong transcriptions and then to train the model based on these corrections.
Since the transcriptions are from movies, do I need to isolate the noisy audio, basically to remove it?
I would prefer not to remove it since this will require a lot of work, to cut it and to remove it and then to do the same for the subtitles with the transcribed text...
For the fine-tunning process I tried it once and after a lot of work I managed to make it work, so I should be able to repeat it. This question is not about how to do the fine-tunning, but about the quality of the audio file.
I've heard others saying that it must be only pure voice like recorded in a studio. I guess that this is not really needed, as the pre-trained models are also trained on some noisy data, I guess, I'm not sure of it.
Very rarelythere is also some background music. Is there a risk that the model would learn something bad, like get confused?
Thank you
Beta Was this translation helpful? Give feedback.
All reactions