Suggestion: Add optional language parameter to Whisper (override automatic language detection) #2694

mark-reijerkerk · 2025-11-21T21:13:00Z

mark-reijerkerk
Nov 21, 2025

The Whisper model card shows that the decoding pipeline already includes a LANGUAGE TAG stage before transcription. This means Whisper is already internally capable of transcribing in a specific language when the tag is known.

Right now, the model always performs automatic language identification, but this can be unreliable for multilingual speakers, mixed-accent audio, or short/noisy recordings.

Proposal:
Add an optional user-provided language parameter that overrides Whisper’s automatic language detection.
When this parameter is not provided, Whisper continues to detect the language automatically as it does today.

This would give developers more control, improve transcription accuracy, and avoid unintended translation or language switching — without requiring any change to the underlying Whisper architecture.

MarktHart · 2025-11-21T21:26:55Z

MarktHart
Nov 21, 2025

Try the --language arg

2 replies

mark-reijerkerk Nov 24, 2025
Author

Thanks — I’m aware of the --language arg. The issue is that it doesn’t function as a strict constraint; Whisper still switches languages under uncertainty. My proposal is aimed at giving the decoder an actual hard language lock, not a soft preference.

MarktHart Nov 24, 2025

Could you share an audio file that actually goes wrong?

Here the language arg is given to the tokenizer, here it's added to the tokenizer's sot_seqeunce, which is then used via here and here.

Retrying the decode (with fallback) doesn't look like it's able to overwrite the language token again, but instead retries the whole sequence again with another temperature.

In short: the language token does seem to be locked in when the language flag is used. You might be seeing the model outputting a different language as hallucination, or the model outputting english when translating, which are both not solvable with code.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestion: Add optional language parameter to Whisper (override automatic language detection) #2694

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Suggestion: Add optional language parameter to Whisper (override automatic language detection) #2694

Uh oh!

mark-reijerkerk Nov 21, 2025

Replies: 1 comment · 2 replies

Uh oh!

MarktHart Nov 21, 2025

Uh oh!

mark-reijerkerk Nov 24, 2025 Author

Uh oh!

MarktHart Nov 24, 2025

mark-reijerkerk
Nov 21, 2025

Replies: 1 comment 2 replies

MarktHart
Nov 21, 2025

mark-reijerkerk Nov 24, 2025
Author