Skip to content

fix(parakeet): clamp multilingual v3 chunk windows for non-English input#589

Open
nachoal wants to merge 1 commit intoBlaizzy:mainfrom
nachoal:parakeet-v3-multilang-subchunking
Open

fix(parakeet): clamp multilingual v3 chunk windows for non-English input#589
nachoal wants to merge 1 commit intoBlaizzy:mainfrom
nachoal:parakeet-v3-multilang-subchunking

Conversation

@nachoal
Copy link
Copy Markdown

@nachoal nachoal commented Mar 20, 2026

Context

Parakeet v3 multilingual decoding becomes unstable on oversized windows. In practice, non-English audio can drift into English/Spanglish when chunk_duration is large.

This PR keeps decoder behavior unchanged and fixes the issue at the chunk scheduler level.

Description

This change adds a small scheduling guard for multilingual Parakeet v3:

  • Detect multilingual Parakeet v3 from the model vocabulary.
  • Accept an optional language hint in the Parakeet generate/decode path.
  • When the model is multilingual v3 and the caller provides a non-English language hint, clamp oversized windows to:
    • chunk_duration = 5.0
    • overlap_duration = 1.0
  • Leave English / auto behavior unchanged.

The goal is to preserve language consistency on non-English input without introducing unsupported decoder prompting hacks.

Changes in the codebase

  • Add multilingual Parakeet v3 detection helper.
  • Add non-English language-hint helper.
  • Add _resolve_multilingual_chunking(...) in the Parakeet base model.
  • Thread the optional language hint through the Parakeet model methods so it can be used strictly for scheduling.
  • Add STT model tests for:
    • v3 multilingual detection
    • clamping on non-English hints
    • preserving baseline behavior for English / auto

Changes outside the codebase

No infrastructure or service changes.

Additional information

Targeted repo tests:

python3 -m pytest -q mlx_audio/stt/tests/test_models.py -k 'multilingual_chunking or multilingual_v3_detection'

Result:

  • 3 passed

End-to-end benchmark summary on a local stt wrapper using this branch:

  • English short: unchanged (1123 words)
  • English long: unchanged (1241 words)
  • Spanish short: fixed (583 -> 1123 words vs prior bad MLX behavior)
  • Spanish long: preserved quality, with some throughput cost due to shorter windows

Subtitle-backed Spanish validation (WER):

  • es_short: 0.0758
  • es_long: 0.1119
  • es_long_motivar_humor: 0.0768
  • es_long_liderazgo_inspira: 0.0708
  • es_long_productividad: 0.0528
  • es_long_atencion_inteligencia: 0.0612
  • es_long_cerebro_atento: 0.0847

The tradeoff is that non-English Parakeet v3 gets slower than the large-window path, but the prior behavior could silently return the wrong language.

Checklist

  • Tests added/updated
  • Documentation updated
  • Issue referenced (no existing repo issue for this specific fix)

@nachoal nachoal force-pushed the parakeet-v3-multilang-subchunking branch from b5c152c to a50107a Compare March 20, 2026 22:11
@nachoal
Copy link
Copy Markdown
Author

nachoal commented Mar 20, 2026

Follow-up validation summary.

Targeted repo tests:

python3 -m pytest -q mlx_audio/stt/tests/test_models.py -k 'multilingual_chunking or multilingual_v3_detection'

Result:

  • 3 passed

End-to-end benchmark on the original sample set:

Sample Backend Time (s) RTF Words
en_short mlx-parakeet 6.65 0.0137 1123
en_long mlx-parakeet 6.80 0.0122 1241
es_short mlx-parakeet 7.55 0.0187 1123
es_long mlx-parakeet 15.57 0.0155 2060

Reference-backed Spanish validation (subtitle/text reference WER):

Sample WER
es_short 0.0758
es_long 0.1119
es_long_motivar_humor 0.0768
es_long_liderazgo_inspira 0.0708
es_long_productividad 0.0528
es_long_atencion_inteligencia 0.0612
es_long_cerebro_atento 0.0847

Interpretation:

  • English stays at baseline quality.
  • The known bad short Spanish case is fixed.
  • The broader long-Spanish validation set stays in a usable WER band while avoiding the previous language drift failure mode.
  • Tradeoff: non-English Parakeet v3 gets slower because it now uses smaller windows when a non-English language hint is provided.

@lucasnewman
Copy link
Copy Markdown
Collaborator

@nachoal Can you retest with the fixes from #592? It should be more faithful to the original implementation now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants