Conceptually, would it make sense to fine-tune just the decoder? #144

danielreuter · 2022-09-26T18:35:28Z

danielreuter
Sep 26, 2022

I'd like to fine-tune one of the medium models to a specific, rare class of language -- one that may not have available much joint audio/transcript data. Trying to figure out whether there's a way that could work.

sanchit-gandhi · 2022-11-04T15:24:09Z

sanchit-gandhi
Nov 4, 2022

This worked very well for me training one of the English checkpoints and fine-tuning it on an English-ASR dataset: here, you assume that the encoder is sufficiently trained during pre-training, and simply train the decoder to match your target text formatting (c.f. https://arxiv.org/abs/2210.13352). This is especially the case with smaller datasets, where you get faster convergence. For larger datasets, you get better results training for longer and not freezing the encoder.

It also worked less well for multilingual ASR, where I found training the encoder of the multilingual checkpoint to be necessary for reasonable performance.

Check-out this blog for fine-tuning Whisper for ASR with Hugging Face Transformers: https://huggingface.co/blog/fine-tune-whisper

It provides a step-by-step guide to fine-tuning, right from data preparation to evaluation 🤗 There'a Google Colab so you can also run it as a notebook 😉

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Conceptually, would it make sense to fine-tune just the decoder? #144

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Conceptually, would it make sense to fine-tune just the decoder? #144

Uh oh!

danielreuter Sep 26, 2022

Replies: 1 comment

Uh oh!

Uh oh!

sanchit-gandhi Nov 4, 2022

danielreuter
Sep 26, 2022

sanchit-gandhi
Nov 4, 2022