Replies: 1 comment 1 reply
-
Following. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi
I want to do speaker diarization on whisper's output.
I know Whisper generates the segments for each audio based on this result:
'segments': [{'seek': 0.0, 'start': 0.46, 'end': 1.98, 'text': ' Hi, how are you', ......
My method is that extract embeddings for each segment and then use a diarizing model for labeling.
I would like to know if the segments that are extracted by whisper are based on speaker change detection or something like that.
I mean Is there only one speaker speaking in each segment?
Beta Was this translation helpful? Give feedback.
All reactions