Train Whisper on New Language #2190
Replies: 5 comments 14 replies
-
Beta Was this translation helpful? Give feedback.
-
whisper ability to support new language is mediocre because of the tokenizer not support your language u may want to try out wav2vec2: https://huggingface.co/blog/fine-tune-w2v2-bert |
Beta Was this translation helpful? Give feedback.
-
This is if the language has very less training data. Currently I have made a dataset which is around 400-420 hours long. While it be sufficient to train Whisper model? Also that I have the data in 2 scripts, will I be able to train 2 models with 2 different language codes? |
Beta Was this translation helpful? Give feedback.
-
Is it that if I send my data to OpenAI, can they train my model and keep it closed until my PhD is done? |
Beta Was this translation helpful? Give feedback.
-
The data right now is 50 hours of data has transcription while the others does not have transcription. How to do unsupervised training on Whisper? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to train Whisper on Konkani speech. The transcription is available in Devanagari and Roman Script. I want to make 2 separate models for both the script. The audio recordings would be the same for each sentence / recording.
I want to train the model with Hugging Face (preferably), but other methods also are possible.
Can someone mention the general script for the task?
Beta Was this translation helpful? Give feedback.
All reactions