Is it possible to get token-level timestamp from origin output tokens? #1949

Wyswyss · 2024-01-08T06:18:05Z

Wyswyss
Jan 8, 2024

Hi guys, I've encountered some issues with obtaining timestamps. I'm trying to get sentence-level timestamps using the fine-tuned Whisper large-v2 model through transformers.pipeline. Despite setting "return_timestamps" to True or specifying "word", the results were unsatisfactory. I suspect this is because the model, having been fine-tuned on about 30,000 hours of audio data, has lost its ability to predict the special timestamp tokens necessary for generating sentence or word-level timestamps. This suspicion was reinforced after checking the output tokens. Consequently, I've come up with an alternative approach to obtain token-level timestamps: Since the input data dimensions for the Encoder are fixed and ordered (chunk=30s, with each point's time-window being 25ms and a stride of 10ms), and considering the characteristics of transformer networks, I believe the dimensions of the Decoder's output (before postprocessing) should also be fixed and ordered. This implies that each output token could correspond to each of the input data points. Then we can get the timestamps of each output tokens. However, I'm not very familiar with the transformer structure and the whisper pipeline. Could you help me assess whether this idea is feasible? Thanks!

afsara-ben · 2024-02-23T01:06:31Z

afsara-ben
Feb 23, 2024

any update on this?

2 replies

ryanheise Feb 23, 2024

~~You can use the --word_timestamps True command line option to enable word level timestamps.~~

~~There is no option for anything smaller than the word level (e.g. token level), but I suspect that word level is what @Wyswyss wanted judging by the description.~~

(edit: I misread the question)

Purfview Feb 23, 2024

OP's problem is not in some option but in:

...model.. fine-tuned... lost its ability to predict the special timestamp tokens necessary for generating sentence or word-level timestamps.

phineas-pta · 2024-02-23T23:17:03Z

phineas-pta
Feb 23, 2024

https://github.com/linto-ai/whisper-timestamped

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Is it possible to get token-level timestamp from origin output tokens? #1949

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Is it possible to get token-level timestamp from origin output tokens? #1949

Uh oh!

Wyswyss Jan 8, 2024

Replies: 2 comments · 2 replies

Uh oh!

afsara-ben Feb 23, 2024

Uh oh!

Uh oh!

ryanheise Feb 23, 2024

Uh oh!

Uh oh!

Purfview Feb 23, 2024

Uh oh!

phineas-pta Feb 23, 2024

Wyswyss
Jan 8, 2024

Replies: 2 comments 2 replies

afsara-ben
Feb 23, 2024

phineas-pta
Feb 23, 2024