Replies: 1 comment 2 replies
-
I don't know about the open AI API service, but with the open source whisper you can get word level timestamps by specifying json as your output type with |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
By using
response_format='srt'
inside the transcribe request, I'm able to retrieve captions as well as timestamps for them, shown in the preview below.However, the window of time for the captions is too large for my use case.
Best case scenario would be to generate exact timestamps for each word.
Good enough would be to have 3-4 words grouped per timestamp. Is there any way to achieve this?
Code:
Output:
Beta Was this translation helpful? Give feedback.
All reactions