Skip to content
Discussion options

You must be logged in to vote

The terminology of "segment" is unfortunately ambiguous in the source code, so I'll use "window" to refer to the 30 second sliding window, and "segment" to refer to a chunk of the transcript bounded by timestamps.

From memory, this is how I recall it working. Let's say you have something like this:

|           window           |           window           |
|segment|-----segment---|--segment--|

Whisper can see that by the end of the first window, the 3rd segment is incomplete because we can't see its end timestamp. So it can rewind to the end timestamp of the 2nd segment thereby shortening the first window, and then starting the second window from that exact point:

|           window    …

Replies: 2 comments 11 replies

Comment options

You must be logged in to vote
11 replies
@ryanheise
Comment options

@Zeiny96
Comment options

@ryanheise
Comment options

@krypton08rises
Comment options

@yoadsn
Comment options

Answer selected by Zeiny96
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
7 participants