Thoughts on Using whisper.pad_or_trim() vs Dynamic Padding for Fine-Tuning Whisper-Large-v3-Turbo #2576

hwanython · 2025-04-15T06:38:29Z

hwanython
Apr 15, 2025

Hi all,

I'm currently fine-tuning openai/whisper-large-v3-turbo and wanted to share some thoughts—and ask for feedback—on the use of whisper.pad_or_trim() vs dynamic padding when preparing the training dataset.

Context:

The official Whisper preprocessing pipeline uses whisper.pad_or_trim(audio) to force input audio to exactly 30 seconds (480,000 samples).
I experimented with both approaches: using pad_or_trim() for fixed-length input and using dynamic padding (based on max length per batch) with proper attention_mask.

My observations:

Criteria	whisper.pad_or_trim()	Dynamic Padding
GPU memory usage	High and fixed	More efficient
Training speed	Slower due to zero-padding	Faster
Loss stability	Slightly more stable	Occasionally fluctuates
Alignment / hallucination	Slightly better	Acceptable, but may need filtering
Match to pretraining setup	✅ Fully aligned	❌ Slightly deviates
Suitability for short audios	❌ Over-padding	✅ Efficient

Conclusion so far:

From a training data preparation perspective, pad_or_trim() seems inefficient, especially when most audio samples are under 30 seconds.
Dynamic padding performs better in terms of speed and memory usage, and works well as long as attention_mask and positional embeddings are correctly handled.
For inference, I might still consider pad_or_trim() to match pretraining behavior and minimize alignment issues.

❓ Questions for the community:

Has anyone else compared training outcomes using both methods?
Are there any known issues with dynamic padding + LoRA in Whisper models?
For models like whisper-large-v3-turbo, is there any strong reason to stick with pad_or_trim for stability?

Thanks in advance for your thoughts!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Thoughts on Using whisper.pad_or_trim() vs Dynamic Padding for Fine-Tuning Whisper-Large-v3-Turbo #2576

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Thoughts on Using whisper.pad_or_trim() vs Dynamic Padding for Fine-Tuning Whisper-Large-v3-Turbo #2576

Uh oh!

hwanython Apr 15, 2025

Context:

My observations:

Conclusion so far:

❓ Questions for the community:

Replies: 0 comments

hwanython
Apr 15, 2025