Only two speakers in an audio but pyannote assigned a new speaker for each segment. #1591
Hieroglyph17
started this conversation in
General
Replies: 2 comments 3 replies
-
Looks like you are using a VoiceActivityDetection pipeline while what you are looking for is a SpeakerDiarization pipeline. |
Beta Was this translation helpful? Give feedback.
1 reply
-
Hi Hervé, I presume you mean https://huggingface.co/pyannote/speaker-diarization-3.1 Many thanks, Christoph |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I got a bunch of warnings when running but presume I can ignore them. However, the output assigns a new speaker for each segment. The audio file is a professional quality interview.
Can you help?
Christoph
Code:
pipeline = Pipeline.from_pretrained("config.yaml")
DEMO_FILE = {'uri': 'blabal', 'audio': '/Users/christophschnelle/Documents/Larry Sinclair Obama_02.wav'}
dz = pipeline(DEMO_FILE)
with open("diarization.txt", "w") as text_file:
text_file.write(str(dz))
print(*list(dz.itertracks(yield_label = True))[:100], sep="\n")
Output:
(...)
torchvision is not available - cannot save figures
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.1.2. To apply the upgrade to your files permanently, run
python -m pytorch_lightning.utilities.upgrade_checkpoint pytorch_model.bin
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.1.2. Bad things might happen unless you revert torch to 1.x.
(<Segment(0.605802, 12.9096)>, 'A', 'SPEECH')
(<Segment(13.0973, 25.1109)>, 'B', 'SPEECH')
(<Segment(25.2645, 27.8413)>, 'C', 'SPEECH')
(<Segment(28.0802, 51.971)>, 'D', 'SPEECH')
(<Segment(53.3362, 59.9061)>, 'E', 'SPEECH')
(<Segment(61.0495, 69.0529)>, 'F', 'SPEECH')
(<Segment(70.4181, 77.7389)>, 'G', 'SPEECH')
(<Segment(78.4386, 86.5785)>, 'H', 'SPEECH')
(<Segment(87.1587, 89.2065)>, 'I', 'SPEECH')
(<Segment(89.3942, 98.9846)>, 'J', 'SPEECH')
(<Segment(99.4283, 102.21)>, 'K', 'SPEECH')
(<Segment(103.951, 126.766)>, 'L', 'SPEECH')
Beta Was this translation helpful? Give feedback.
All reactions