Skip to content

Commit 2f2841b

Browse files
jkarthicggerganov
andauthored
whisper : add single-timestamp logic (#2629)
* Fix hallucinations during silence When the predicted tokens end with a single timestamp the the entire 30 segment should be considered as done, to avoid hallucinations for the remaining part of segment. This behaviour is on par with openai's whisper. Refer to logic related to `single_timestamp_ending` in https://github.com/openai/whisper/blob/main/whisper/transcribe.py * Accept review comments related to formatting. Co-authored-by: Georgi Gerganov <[email protected]> --------- Co-authored-by: Georgi Gerganov <[email protected]>
1 parent 09a1b61 commit 2f2841b

File tree

1 file changed

+10
-1
lines changed

1 file changed

+10
-1
lines changed

src/whisper.cpp

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6060,7 +6060,7 @@ int whisper_full_with_state(
60606060
{
60616061
const auto & best_decoder = state->decoders[best_decoder_id];
60626062

6063-
const auto seek_delta = best_decoder.seek_delta;
6063+
auto seek_delta = best_decoder.seek_delta;
60646064
const auto result_len = best_decoder.sequence.result_len;
60656065

60666066
const auto & tokens_cur = best_decoder.sequence.tokens;
@@ -6201,6 +6201,15 @@ int whisper_full_with_state(
62016201
}
62026202
}
62036203

6204+
// ref: https://github.com/ggerganov/whisper.cpp/pull/2629
6205+
const bool single_timestamp_ending = tokens_cur.size() > 1 &&
6206+
tokens_cur[tokens_cur.size() - 2].id < whisper_token_beg(ctx) &&
6207+
tokens_cur[tokens_cur.size() - 1].id > whisper_token_beg(ctx);
6208+
if (single_timestamp_ending) {
6209+
WHISPER_LOG_DEBUG("single timestamp ending - skip entire chunk\n");
6210+
seek_delta = std::min(seek_end - seek, WHISPER_CHUNK_SIZE * 100);
6211+
}
6212+
62046213
// update audio window
62056214
seek += seek_delta;
62066215

0 commit comments

Comments
 (0)