Subtitles with a much smaller caption time window #1557

doncezart · 2023-07-28T23:05:47Z

doncezart
Jul 28, 2023

By using response_format='srt' inside the transcribe request, I'm able to retrieve captions as well as timestamps for them, shown in the preview below.

However, the window of time for the captions is too large for my use case.

Best case scenario would be to generate exact timestamps for each word.
Good enough would be to have 3-4 words grouped per timestamp. Is there any way to achieve this?

Code:

audio_file= open("D:/Scripts/whisper/output.mp4", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file, response_format='srt')
print(transcript)

Output:

1
00:00:00,000 --> 00:00:03,000
And then there's a reward. You get a little stress relief.

2
00:00:03,000 --> 00:00:08,000
The only way to break a habit, you guys, is not to deal with the triggers.

3
00:00:08,000 --> 00:00:11,000
You're never going to get rid of the stress in your life.

4
00:00:11,000 --> 00:00:16,000
But you can 100% change your pattern of avoiding work.```

glangford · 2023-07-28T23:36:30Z

glangford
Jul 28, 2023

I don't know about the open AI API service, but with the open source whisper you can get word level timestamps by specifying json as your output type with --word-timestamps True.

2 replies

doncezart Jul 29, 2023
Author

Just discovered the open source version, although I'm not familiar to using parameters inside python instead of command line. Any help?

EDIT

I discovered how to do it and also how to export as .srt, for anyone wondering:

    import whisper
    from whisper.utils import get_writer

    model = whisper.load_model("base")
    result = model.transcribe("D:/Scripts/whisper/output.mp4", verbose = True, language = 'english', word_timestamps = True)
    writer = get_writer("srt", ".") # get srt writer for the current directory
    writer(result, "D:/Scripts/whisper/output.srt")

However, two small issues (3 is the most important one, the rest can be ignored):

Error in VSCode: Argument of type "Literal['D:/Scripts/whisper/output.srt']" cannot be assigned to parameter of type "TextIO" "Literal['D:/Scripts/whisper/output.srt']" is incompatible with "TextIO"
Whenever I run the program, I get another error, in terminal: timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
The subtitles are shown as highlighted instead of exported as one word per timestamp.

How it is right now:

1
00:00:00,000 --> 00:00:00,460
<u>And</u> then there's a reward you get a little stress relief.

2
00:00:00,460 --> 00:00:00,560
And<u> then</u> there's a reward you get a little stress relief.

3
00:00:00,560 --> 00:00:00,860
And then<u> there's</u> a reward you get a little stress relief.

4
00:00:00,860 --> 00:00:00,940
And then there's<u> a</u> reward you get a little stress relief.

How I want it to be:

1
00:00:00,000 --> 00:00:00,460
And

2
00:00:00,460 --> 00:00:00,560
then

3
00:00:00,560 --> 00:00:00,860
there's

4
00:00:00,860 --> 00:00:00,940
a

glangford Jul 29, 2023

For example

result = model.transcribe(vfile, language='en', task='transcribe', word_timestamps=True, verbose=False)

and then if you are using the 20230314 release

 writer = get_writer("json", ".")
 writer(result, vfile)

You can then parse the json and write an srt file word by word if you choose.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Subtitles with a much smaller caption time window #1557

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Subtitles with a much smaller caption time window #1557

Uh oh!

doncezart Jul 28, 2023

Replies: 1 comment · 2 replies

Uh oh!

glangford Jul 28, 2023

Uh oh!

Uh oh!

doncezart Jul 29, 2023 Author

EDIT

Uh oh!

glangford Jul 29, 2023

doncezart
Jul 28, 2023

Replies: 1 comment 2 replies

glangford
Jul 28, 2023

doncezart Jul 29, 2023
Author