Built-in speaker diarization? #1552

martinmueller4voice · 2023-07-26T06:44:30Z

martinmueller4voice
Jul 26, 2023

Hi community!
I had whisper (most recent version from GitHub, commit b91c907) transcribe a German test recording using this call:
!whisper --language de --model large --suppress_tokens 0,11,13,30 audio.wav
(--suppress_tokens to suppress output of auto-punctuation) and was surprised to see the transcript beginning with:
(Rednerwechsel) Sehr geehrte Damen
("Rednerwechsel" means change of speaker)
This change of speaker seems to be token 50364 (if I interprete the json file correctly), if this is of any help.

So far, I've always transcribed single speaker recordings and the one above is no exception.
In addition, this (Rednerwechsel) appears at the very beginning of the transcription, so there was no other speaker talking before.
Other recordings from the same speaker don't show this output.

I always thought Whisper doesn't have speaker diarization out of the box, so where does this token come from?

phineas-pta · 2023-07-26T08:54:52Z

phineas-pta
Jul 26, 2023

likely training data problem like #928

1 reply

martinmueller4voice Jul 26, 2023
Author

I just tried with large-v1 and there the "(Rednerwechsel)" did not happen.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Built-in speaker diarization? #1552

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Built-in speaker diarization? #1552

Uh oh!

martinmueller4voice Jul 26, 2023

Replies: 1 comment · 1 reply

Uh oh!

phineas-pta Jul 26, 2023

Uh oh!

martinmueller4voice Jul 26, 2023 Author

martinmueller4voice
Jul 26, 2023

Replies: 1 comment 1 reply

phineas-pta
Jul 26, 2023

martinmueller4voice Jul 26, 2023
Author