Skip to content

feat: add speaker diarization service using pyannote.audio#349

Open
kimwwk wants to merge 1 commit intoZackriya-Solutions:mainfrom
kimwwk:feat/speaker-diarization-pr
Open

feat: add speaker diarization service using pyannote.audio#349
kimwwk wants to merge 1 commit intoZackriya-Solutions:mainfrom
kimwwk:feat/speaker-diarization-pr

Conversation

@kimwwk
Copy link

@kimwwk kimwwk commented Feb 11, 2026

Summary

  • Add a new backend/diarization_service/ that wraps whisper.cpp transcription with speaker identification using pyannote.audio
  • Cross-chunk speaker tracking via voice embeddings keeps speaker IDs consistent throughout a meeting session
  • Add speaker field to backend transcript storage (db.py) and API model (main.py)

How it works

The diarization service runs as a sidecar to the existing whisper.cpp server:

Audio file → Diarization Service (:8179)
                ├─ Forward to Whisper (:8178) → timestamped segments
                ├─ Run pyannote diarization → speaker turns
                ├─ Match speakers via embedding similarity (cross-chunk)
                └─ Merge → segments with speaker labels

Endpoints

Endpoint Method Description
/inference POST Transcribe with diarization (compatible with whisper.cpp API)
/transcribe POST Simplified transcription endpoint
/health GET Service health with per-component status
/session/{id}/speakers GET Speaker summary for a session
/session/{id} DELETE Clear session speaker data

Configuration (environment variables)

Variable Default Description
WHISPER_SERVER_URL http://localhost:8178 Whisper.cpp server URL
HF_AUTH_TOKEN (required) Hugging Face token for pyannote gated models
DIARIZATION_PIPELINE pyannote/speaker-diarization-3.1 Diarization model
SPEAKER_EMBEDDING_DIR (disabled) Directory to persist speaker embeddings

New files

backend/diarization_service/
├── Dockerfile            # Standalone container
├── requirements.txt      # Pinned deps (torch<2.6, huggingface_hub<0.23)
├── main.py               # FastAPI app with endpoints
├── config.py             # Env-var based configuration
├── processor.py          # Orchestrates transcription + diarization
├── diarization.py        # pyannote.audio pipeline wrapper
├── speaker_tracker.py    # Cross-chunk speaker consistency via embeddings
├── whisper_client.py     # HTTP client for whisper.cpp server
└── audio_utils.py        # FFmpeg audio conversion utilities

Backend changes

  • db.py: Added speaker TEXT column to transcripts table (with migration)
  • main.py: Added speaker field to Transcript model and save flow

Related Issues

Addresses #230, #226, #56

Testing

  • Service starts and initializes pyannote pipeline on GPU
  • /inference endpoint returns segments with speaker labels
  • Speaker IDs remain consistent across multiple audio chunks within a session
  • Graceful degradation when diarization model unavailable (returns UNKNOWN speaker)
  • Backend DB migration adds speaker column without breaking existing data

Technical Notes

  • torch<2.6 pin is required because PyTorch 2.6+ changed weights_only=True default which breaks pyannote model loading
  • huggingface_hub<0.23 pin is required because 0.23+ deprecated the use_auth_token parameter used by pyannote
  • pyannote models are gated on Hugging Face and require accepting the license + auth token

🤖 Generated with Claude Code

Add a new diarization service that wraps whisper.cpp transcription with
speaker identification using pyannote.audio. The service runs alongside
the existing whisper server and enriches transcription output with
speaker labels.

Key features:
- Speaker diarization via pyannote/speaker-diarization-3.1
- Cross-chunk speaker tracking using voice embeddings for consistent
  speaker IDs throughout a meeting session
- Compatible /inference endpoint that extends whisper.cpp API
- Session management endpoints for speaker data lifecycle
- Standalone Dockerfile for containerized deployment
- Speaker field added to backend transcript storage and API

Addresses Zackriya-Solutions#230, Zackriya-Solutions#226, Zackriya-Solutions#56

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant