Skip to content

Conversation

feengg
Copy link

@feengg feengg commented Oct 9, 2025

token: Add interfaces for getting start and end time of specified token similar like segment

@ggerganov
Copy link
Member

This is already available through the whisper_full_get_token_data API

@feengg
Copy link
Author

feengg commented Oct 10, 2025

This is already available through the whisper_full_get_token_data API

After enabling VAD, it looks whisper_full_get_token_data only returned the processed timestamp not the original.

whisper_vad: total duration of speech segments: 8.21 seconds
whisper_vad: vad_segment_info: orig_start: 0.45, orig_end: 2.30, vad_start: 0.00, vad_end: 1.95
whisper_vad: vad_segment_info: orig_start: 3.39, orig_end: 4.48, vad_start: 2.05, vad_end: 3.24
whisper_vad: vad_segment_info: orig_start: 5.54, orig_end: 7.71, vad_start: 3.34, vad_end: 5.61
whisper_vad: vad_segment_info: orig_start: 8.32, orig_end: 11.14, vad_start: 5.71, vad_end: 8.51
whisper_vad: Created time mapping table with 43 points
whisper_vad: Reduced audio from 177984 to 136224 samples (23.5% reduction)

Token Timestamp
Begin           End             Token-id   Token
00:00:00.000    00:00:00.000    50364      [_BEG_]
00:00:00.000    00:00:00.180    400         And
00:00:00.340    00:00:00.420    370         so
00:00:00.420    00:00:00.580    11         ,
00:00:00.580    00:00:00.750    452         my
00:00:00.750    00:00:01.250    7177        fellow
00:00:01.250    00:00:01.760    6280        Americans
00:00:02.050    00:00:02.410    1029        ask
00:00:02.780    00:00:03.040    406         not
00:00:03.330    00:00:03.440    437         what
00:00:03.440    00:00:03.740    428         your
00:00:03.870    00:00:04.320    1941        country
00:00:04.410    00:00:04.620    393         can
00:00:04.620    00:00:04.800    360         do
00:00:04.920    00:00:05.200    337         for
00:00:05.200    00:00:05.400    291         you
00:00:05.700    00:00:05.770    11         ,
00:00:05.770    00:00:05.960    1029        ask
00:00:06.130    00:00:06.240    437         what
00:00:06.400    00:00:06.640    291         you
00:00:06.640    00:00:06.880    393         can
00:00:06.940    00:00:07.000    360         do
00:00:07.000    00:00:07.280    337         for
00:00:07.280    00:00:07.490    428         your
00:00:07.530    00:00:07.860    1941        country
00:00:07.980    00:00:08.000    13         .
00:00:08.000    00:00:08.000    50764      [_TT_400]
[00:00:00.450 --> 00:00:10.620]   And so, my fellow Americans ask not what your country can do for you, ask what you can do for your country.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants