Skip to content

[whisper] Feature Request: Add carry_initial_prompt parameter to transcribe() #1410

@malleroid

Description

@malleroid

Summary

I'd like to request the addition of a carry_initial_prompt parameter to
transcribe(). I'm happy to submit a PR if the maintainers are open to it.

Problem

Currently, initial_prompt is only applied to the first 30-second chunk.
For longer audio files (e.g. meeting recordings, lectures), the vocabulary
and style hints provided via initial_prompt have no effect on subsequent
chunks. This makes it difficult to reliably guide transcription of
domain-specific terms, proper nouns, or formatting style throughout an
entire file.

Prior Art

This has already been addressed in the two other major Whisper implementations:

  • openai/whisper — merged in PR #2343 (~1 year ago)
  • whisper.cpp — requested in Issue #2564,
    implemented in PR #3395,
    and released in v1.8.1

mlx-whisper is currently the only major implementation without this feature.

Proposed Change

Add carry_initial_prompt: bool = False to the transcribe() signature:

# mlx_whisper/transcribe.py
def transcribe(
    audio,
    *,
    ...
    initial_prompt: Optional[str] = None,
    carry_initial_prompt: bool = False,  # new
    ...
)

When carry_initial_prompt=True, prepend initial_prompt_tokens to the
prompt of every chunk in the transcription loop, left-slicing to fit within
the 223-token limit — mirroring the upstream implementation:

https://github.com/openai/whisper/blob/main/whisper/transcribe.py

Backward Compatibility

The default value is False, so existing behavior is fully preserved.

Note

As mentioned in the original openai/whisper PR, setting
carry_initial_prompt=True may increase the risk of looping and could
reduce the effectiveness of condition_on_previous_text. Users should be
aware of this trade-off.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions