Conceptually, would it make sense to fine-tune just the decoder? #144
Replies: 1 comment
-
This worked very well for me training one of the English checkpoints and fine-tuning it on an English-ASR dataset: here, you assume that the encoder is sufficiently trained during pre-training, and simply train the decoder to match your target text formatting (c.f. https://arxiv.org/abs/2210.13352). This is especially the case with smaller datasets, where you get faster convergence. For larger datasets, you get better results training for longer and not freezing the encoder. It also worked less well for multilingual ASR, where I found training the encoder of the multilingual checkpoint to be necessary for reasonable performance. Check-out this blog for fine-tuning Whisper for ASR with Hugging Face Transformers: https://huggingface.co/blog/fine-tune-whisper It provides a step-by-step guide to fine-tuning, right from data preparation to evaluation 🤗 There'a Google Colab so you can also run it as a notebook 😉 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'd like to fine-tune one of the medium models to a specific, rare class of language -- one that may not have available much joint audio/transcript data. Trying to figure out whether there's a way that could work.
Beta Was this translation helpful? Give feedback.
All reactions