Fine-tuning long audio and subtitle #2075

marceltud · 2024-03-09T14:52:23Z

marceltud
Mar 9, 2024

Hi,

I have used whisper to transcribe audio from a few movies. I'm planning to correct the subtitles and then use it to fine-tune the whisper model.

Since whisper is not using timestamps I was wondering what is the risk for whisper to have a gap between the audio and the subtitiles?

I know that behind the scenes the python code breaks down the bigger audio into 30seconds chunks, however how would the python code how to cut the subtitles after 30 seconds since the subtitles are not timed, since whisper cannot be trained using timed subtitles.

Thank you
P.S. I've seen that this project does not work, it's a pitty... https://github.com/jumon/whisper-finetuning

pr0mila · 2025-04-18T05:36:22Z

pr0mila
Apr 18, 2025

I’ve created a code-switched language dataset for fine-tuning Whisper, including audio data along with CSV and Parquet files, which I’ve stored on Hugging Face. After preparing the dataset, I fine-tuned the model for translation. You can explore the entire end-to-end project in my repo. Here’s the link to check it out: https://github.com/pr0mila/MediBeng-Whisper-Tiny

For time-stamp, you can use faster-whisper interference: https://github.com/SYSTRAN/faster-whisper

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine-tuning long audio and subtitle #2075

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Fine-tuning long audio and subtitle #2075

Uh oh!

marceltud Mar 9, 2024

Replies: 1 comment

Uh oh!

pr0mila Apr 18, 2025

marceltud
Mar 9, 2024

pr0mila
Apr 18, 2025