30s Segmentation #490

catalwaysright · 2022-11-07T19:29:13Z

catalwaysright
Nov 7, 2022

The whisper model will segment the audio into 30s chunks and then do the transcription. I am curious that how the model do this segmentation without cutting a word in middle.

JunZhan2000 · 2023-02-16T16:11:01Z

JunZhan2000
Feb 16, 2023

You can read the transcribe function in transcribe.py. because whisper will output the timestamps, model will output the last full sentence at end timestamps, and then cut the audio from here to do next decoding, such as 25s-5s, not 30s -60s

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

30s Segmentation #490

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

30s Segmentation #490

Uh oh!

catalwaysright Nov 7, 2022

Replies: 1 comment

Uh oh!

JunZhan2000 Feb 16, 2023

catalwaysright
Nov 7, 2022

JunZhan2000
Feb 16, 2023