You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hey,
I have been using the excellent Parakeet v3 model with timestamps, by enabling them in the RNNTDecodingConfig.
The Parakeet timestamps are given as integers (e.g. 13 or 961). These values are far off from the actual seconds. I figured a timestep might be 0.04s, or something similiar, but the math doesn't work.
For example I get, in the same transcript, the following two timestamps. The actual occurence in the voicefile I had to find out by listening to the voicefile.
"13" in Parakeet -> "1.12" seconds" in voicefile
"961" in Parakeet -> "76.88 seconds" in voicefile
I feel like I'm missing something. How do I get from the "13" in parakeet to the 1.12 seconds, and then with the same formula from the "961" in Parakeet to the 76.88 seconds?
I've been approximating with division by 9, but there has to be a better way.
Below ist the code I'm using to transcribe and get the timestamps.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey,
I have been using the excellent Parakeet v3 model with timestamps, by enabling them in the RNNTDecodingConfig.
The Parakeet timestamps are given as integers (e.g. 13 or 961). These values are far off from the actual seconds. I figured a timestep might be 0.04s, or something similiar, but the math doesn't work.
For example I get, in the same transcript, the following two timestamps. The actual occurence in the voicefile I had to find out by listening to the voicefile.
"13" in Parakeet -> "1.12" seconds" in voicefile
"961" in Parakeet -> "76.88 seconds" in voicefile
I feel like I'm missing something. How do I get from the "13" in parakeet to the 1.12 seconds, and then with the same formula from the "961" in Parakeet to the 76.88 seconds?
I've been approximating with division by 9, but there has to be a better way.
Below ist the code I'm using to transcribe and get the timestamps.
Beta Was this translation helpful? Give feedback.
All reactions