fix(parakeet): clamp multilingual v3 chunk windows for non-English input by nachoal · Pull Request #589 · Blaizzy/mlx-audio

nachoal · 2026-03-20T22:05:46Z

Context

Parakeet v3 multilingual decoding becomes unstable on oversized windows. In practice, non-English audio can drift into English/Spanglish when chunk_duration is large.

This PR keeps decoder behavior unchanged and fixes the issue at the chunk scheduler level.

Description

This change adds a small scheduling guard for multilingual Parakeet v3:

Detect multilingual Parakeet v3 from the model vocabulary.
Accept an optional language hint in the Parakeet generate/decode path.
When the model is multilingual v3 and the caller provides a non-English language hint, clamp oversized windows to:
- chunk_duration = 5.0
- overlap_duration = 1.0
Leave English / auto behavior unchanged.

The goal is to preserve language consistency on non-English input without introducing unsupported decoder prompting hacks.

Changes in the codebase

Add multilingual Parakeet v3 detection helper.
Add non-English language-hint helper.
Add _resolve_multilingual_chunking(...) in the Parakeet base model.
Thread the optional language hint through the Parakeet model methods so it can be used strictly for scheduling.
Add STT model tests for:
- v3 multilingual detection
- clamping on non-English hints
- preserving baseline behavior for English / auto

Changes outside the codebase

No infrastructure or service changes.

Additional information

Targeted repo tests:

python3 -m pytest -q mlx_audio/stt/tests/test_models.py -k 'multilingual_chunking or multilingual_v3_detection'

Result:

3 passed

End-to-end benchmark summary on a local stt wrapper using this branch:

English short: unchanged (1123 words)
English long: unchanged (1241 words)
Spanish short: fixed (583 -> 1123 words vs prior bad MLX behavior)
Spanish long: preserved quality, with some throughput cost due to shorter windows

Subtitle-backed Spanish validation (WER):

es_short: 0.0758
es_long: 0.1119
es_long_motivar_humor: 0.0768
es_long_liderazgo_inspira: 0.0708
es_long_productividad: 0.0528
es_long_atencion_inteligencia: 0.0612
es_long_cerebro_atento: 0.0847

The tradeoff is that non-English Parakeet v3 gets slower than the large-window path, but the prior behavior could silently return the wrong language.

Checklist

Tests added/updated
Documentation updated
Issue referenced (no existing repo issue for this specific fix)

nachoal · 2026-03-20T22:14:07Z

Follow-up validation summary.

Targeted repo tests:

python3 -m pytest -q mlx_audio/stt/tests/test_models.py -k 'multilingual_chunking or multilingual_v3_detection'

Result:

3 passed

End-to-end benchmark on the original sample set:

Sample	Backend	Time (s)	RTF	Words
en_short	mlx-parakeet	6.65	0.0137	1123
en_long	mlx-parakeet	6.80	0.0122	1241
es_short	mlx-parakeet	7.55	0.0187	1123
es_long	mlx-parakeet	15.57	0.0155	2060

Reference-backed Spanish validation (subtitle/text reference WER):

Sample	WER
es_short	0.0758
es_long	0.1119
es_long_motivar_humor	0.0768
es_long_liderazgo_inspira	0.0708
es_long_productividad	0.0528
es_long_atencion_inteligencia	0.0612
es_long_cerebro_atento	0.0847

Interpretation:

English stays at baseline quality.
The known bad short Spanish case is fixed.
The broader long-Spanish validation set stays in a usable WER band while avoiding the previous language drift failure mode.
Tradeoff: non-English Parakeet v3 gets slower because it now uses smaller windows when a non-English language hint is provided.

lucasnewman · 2026-03-22T15:18:51Z

@nachoal Can you retest with the fixes from #592? It should be more faithful to the original implementation now.

fix(parakeet): clamp multilingual v3 chunk windows for non-English input

a50107a

nachoal force-pushed the parakeet-v3-multilang-subchunking branch from b5c152c to a50107a Compare March 20, 2026 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(parakeet): clamp multilingual v3 chunk windows for non-English input#589

fix(parakeet): clamp multilingual v3 chunk windows for non-English input#589
nachoal wants to merge 1 commit intoBlaizzy:mainfrom
nachoal:parakeet-v3-multilang-subchunking

nachoal commented Mar 20, 2026

Uh oh!

nachoal commented Mar 20, 2026

Uh oh!

lucasnewman commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

nachoal commented Mar 20, 2026

Context

Description

Changes in the codebase

Changes outside the codebase

Additional information

Checklist

Uh oh!

nachoal commented Mar 20, 2026

Uh oh!

lucasnewman commented Mar 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants