Parakeet v2, v3 max_duration difference #15209

mingewang · 2025-12-18T21:50:03Z

mingewang
Dec 18, 2025

HI

I noticed a difference in model_config.yaml between Parakeet v2 and Parakeet v3 models:

Parakeet v2
train_ds:
max_duration: 40
validation_ds:
max_duration: 40

Parakeet v3
train_ds:
max_duration: 10
validation_ds:
max_duration: 30

Does this mean that Parakeet v3 was trained with training samples truncated to 10 seconds, while Parakeet v2 was trained with up to 40-second samples?
If so, what was the motivation for reducing the training max_duration in v3?

When I try to increase train_ds.max_duration for Parakeet v3 to 30 seconds, I easily run into CUDA OOM errors on an A100 (80GB).

Are there recommended settings (batch size, gradient accumulation, bucketing strategy, etc.) to safely train v3 with longer utterances?

Any guidance on best practices for handling longer audio with Parakeet v3 ( finetune) would be appreciated.

Thanks!

jeremy110 · 2025-12-22T08:32:20Z

jeremy110
Dec 22, 2025

@mingewang
Depending on your dataset, but typically max_duration is set to 30 to 40, and then used with lhotse for dataloader. The relevant settings can be referenced if you're interested. https://github.com/jeremy110/Finetune_Nemo_ASR

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parakeet v2, v3 max_duration difference #15209

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Parakeet v2, v3 max_duration difference #15209

Uh oh!

mingewang Dec 18, 2025

Replies: 1 comment

Uh oh!

jeremy110 Dec 22, 2025

mingewang
Dec 18, 2025

jeremy110
Dec 22, 2025