Skip to content
Discussion options

You must be logged in to vote

Your observation is correct; Whisper is not explicitly trained for word-level timestamps and the current outputs are produced by an inference-time trick, which does not give perfectly accurate timing, especially when dealing with pauses..

Replies: 3 comments 8 replies

Comment options

You must be logged in to vote
5 replies
@ZayneHuang
Comment options

@jongwook
Comment options

Answer selected by ZayneHuang
@ZayneHuang
Comment options

@SyntaxJO
Comment options

@ZayneHuang
Comment options

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
3 replies
@glangford
Comment options

@siddhsql
Comment options

@siddhsql
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants