Does fine-tuning Whisper without timestamp affects long-form transcription performance? #1703

jeewooyoon-raondata · 2023-10-11T05:23:04Z

jeewooyoon-raondata
Oct 11, 2023

Thank you for sharing this great open-source :)

I successfully fine-tuned the Whisper medium model with Korean speech data (10~15 secs each -> padded 30 secs).

I revealed that the fine-tuned model performs better (i.e., C/WER) than the Whisper large-v2 model with short-form (i.e., under 30s) speech data. => fine-tune: 20% CER vs. large-v2: 28% CER

However, when I try to perform long-form transcription with the transcribe function, the model returns unstable outputs.
==> Changing decoding option parameters increases the performance slightly but is still unstable.

So, my question is ...

What can be the reason for this situation?
Maybe because I fine-tuned the model without timestamps?
Or most of the data used in fine-tuning is shorter than the original training data? (30 secs)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does fine-tuning Whisper without timestamp affects long-form transcription performance? #1703

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Does fine-tuning Whisper without timestamp affects long-form transcription performance? #1703

Uh oh!

jeewooyoon-raondata Oct 11, 2023

Replies: 0 comments

jeewooyoon-raondata
Oct 11, 2023