Speaker diarization requires a HuggingFace token to download the pyannote models.
-
Get a HuggingFace Token:
- Go to: https://huggingface.co/settings/tokens
- Create a new token (read access is sufficient)
- Copy the token
-
Accept the pyannote model license:
- Go to: https://huggingface.co/pyannote/speaker-diarization-3.1
- Click "Agree and access repository"
- Also visit: https://huggingface.co/pyannote/segmentation-3.0
- Click "Agree and access repository"
-
Set the token:
export HF_TOKEN="your_token_here"
-
Re-run the transcription:
source .venv/bin/activate python examples/transcribe_only.py ~/Movies/"2025-11-04 14-36-25.mp4" \ --model tiny \ --output-dir output
The transcription will now include speaker labels like SPEAKER_00, SPEAKER_01, etc.