Need for a verbatim / raw transcription mode for forced alignment #2712

bvegliantestudioeco · 2026-01-08T11:02:17Z

bvegliantestudioeco
Jan 8, 2026

Whisper produces transcriptions optimized for readability, which works very well
for most use cases.

However, for professional workflows involving forced alignment (e.g. Aeneas,
subtitle timing, e-learning pipelines), this creates a limitation.

What seems to be missing is a verbatim / raw transcription mode that:

preserves fillers, hesitations and repetitions
preserves exact word order
avoids merging or rewriting spoken content
prioritizes alignment correctness over readability

This would not require changing the default behavior.
A separate flag or decoding mode (e.g. --verbatim) could explicitly separate
transcription from normalization.

Today users are forced to choose between:

readable text that drifts during alignment
verbatim text with recognition errors

Curious to hear if this is something the team has considered,
or if there are recommended approaches for alignment-critical workflows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Need for a verbatim / raw transcription mode for forced alignment #2712

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Need for a verbatim / raw transcription mode for forced alignment #2712

Uh oh!

bvegliantestudioeco Jan 8, 2026

Replies: 0 comments

bvegliantestudioeco
Jan 8, 2026