-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Summary
I'd like to request the addition of a carry_initial_prompt parameter to
transcribe(). I'm happy to submit a PR if the maintainers are open to it.
Problem
Currently, initial_prompt is only applied to the first 30-second chunk.
For longer audio files (e.g. meeting recordings, lectures), the vocabulary
and style hints provided via initial_prompt have no effect on subsequent
chunks. This makes it difficult to reliably guide transcription of
domain-specific terms, proper nouns, or formatting style throughout an
entire file.
Prior Art
This has already been addressed in the two other major Whisper implementations:
- openai/whisper — merged in PR #2343 (~1 year ago)
- whisper.cpp — requested in Issue #2564,
implemented in PR #3395,
and released in v1.8.1
mlx-whisper is currently the only major implementation without this feature.
Proposed Change
Add carry_initial_prompt: bool = False to the transcribe() signature:
# mlx_whisper/transcribe.py
def transcribe(
audio,
*,
...
initial_prompt: Optional[str] = None,
carry_initial_prompt: bool = False, # new
...
)When carry_initial_prompt=True, prepend initial_prompt_tokens to the
prompt of every chunk in the transcription loop, left-slicing to fit within
the 223-token limit — mirroring the upstream implementation:
https://github.com/openai/whisper/blob/main/whisper/transcribe.py
Backward Compatibility
The default value is False, so existing behavior is fully preserved.
Note
As mentioned in the original openai/whisper PR, setting
carry_initial_prompt=True may increase the risk of looping and could
reduce the effectiveness of condition_on_previous_text. Users should be
aware of this trade-off.