Built-in speaker diarization? #1552
martinmueller4voice
started this conversation in
General
Replies: 1 comment 1 reply
-
likely training data problem like #928 |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi community!
I had whisper (most recent version from GitHub, commit b91c907) transcribe a German test recording using this call:
!whisper --language de --model large --suppress_tokens 0,11,13,30 audio.wav
(--suppress_tokens to suppress output of auto-punctuation) and was surprised to see the transcript beginning with:
(Rednerwechsel) Sehr geehrte Damen
("Rednerwechsel" means change of speaker)
This change of speaker seems to be token 50364 (if I interprete the json file correctly), if this is of any help.
So far, I've always transcribed single speaker recordings and the one above is no exception.
In addition, this (Rednerwechsel) appears at the very beginning of the transcription, so there was no other speaker talking before.
Other recordings from the same speaker don't show this output.
I always thought Whisper doesn't have speaker diarization out of the box, so where does this token come from?
Beta Was this translation helpful? Give feedback.
All reactions