feat: add speaker diarization service using pyannote.audio by kimwwk · Pull Request #349 · Zackriya-Solutions/meetily

kimwwk · 2026-02-11T04:29:05Z

Summary

Add a new backend/diarization_service/ that wraps whisper.cpp transcription with speaker identification using pyannote.audio
Cross-chunk speaker tracking via voice embeddings keeps speaker IDs consistent throughout a meeting session
Add speaker field to backend transcript storage (db.py) and API model (main.py)

How it works

The diarization service runs as a sidecar to the existing whisper.cpp server:

Audio file → Diarization Service (:8179)
                ├─ Forward to Whisper (:8178) → timestamped segments
                ├─ Run pyannote diarization → speaker turns
                ├─ Match speakers via embedding similarity (cross-chunk)
                └─ Merge → segments with speaker labels

Endpoints

Endpoint	Method	Description
`/inference`	POST	Transcribe with diarization (compatible with whisper.cpp API)
`/transcribe`	POST	Simplified transcription endpoint
`/health`	GET	Service health with per-component status
`/session/{id}/speakers`	GET	Speaker summary for a session
`/session/{id}`	DELETE	Clear session speaker data

Configuration (environment variables)

Variable	Default	Description
`WHISPER_SERVER_URL`	`http://localhost:8178`	Whisper.cpp server URL
`HF_AUTH_TOKEN`	(required)	Hugging Face token for pyannote gated models
`DIARIZATION_PIPELINE`	`pyannote/speaker-diarization-3.1`	Diarization model
`SPEAKER_EMBEDDING_DIR`	(disabled)	Directory to persist speaker embeddings

New files

backend/diarization_service/
├── Dockerfile            # Standalone container
├── requirements.txt      # Pinned deps (torch<2.6, huggingface_hub<0.23)
├── main.py               # FastAPI app with endpoints
├── config.py             # Env-var based configuration
├── processor.py          # Orchestrates transcription + diarization
├── diarization.py        # pyannote.audio pipeline wrapper
├── speaker_tracker.py    # Cross-chunk speaker consistency via embeddings
├── whisper_client.py     # HTTP client for whisper.cpp server
└── audio_utils.py        # FFmpeg audio conversion utilities

Backend changes

db.py: Added speaker TEXT column to transcripts table (with migration)
main.py: Added speaker field to Transcript model and save flow

Related Issues

Addresses #230, #226, #56

Testing

Service starts and initializes pyannote pipeline on GPU
/inference endpoint returns segments with speaker labels
Speaker IDs remain consistent across multiple audio chunks within a session
Graceful degradation when diarization model unavailable (returns UNKNOWN speaker)
Backend DB migration adds speaker column without breaking existing data

Technical Notes

torch<2.6 pin is required because PyTorch 2.6+ changed weights_only=True default which breaks pyannote model loading
huggingface_hub<0.23 pin is required because 0.23+ deprecated the use_auth_token parameter used by pyannote
pyannote models are gated on Hugging Face and require accepting the license + auth token

🤖 Generated with Claude Code

Add a new diarization service that wraps whisper.cpp transcription with speaker identification using pyannote.audio. The service runs alongside the existing whisper server and enriches transcription output with speaker labels. Key features: - Speaker diarization via pyannote/speaker-diarization-3.1 - Cross-chunk speaker tracking using voice embeddings for consistent speaker IDs throughout a meeting session - Compatible /inference endpoint that extends whisper.cpp API - Session management endpoints for speaker data lifecycle - Standalone Dockerfile for containerized deployment - Speaker field added to backend transcript storage and API Addresses Zackriya-Solutions#230, Zackriya-Solutions#226, Zackriya-Solutions#56 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add speaker diarization service using pyannote.audio#349

feat: add speaker diarization service using pyannote.audio#349
kimwwk wants to merge 1 commit intoZackriya-Solutions:mainfrom
kimwwk:feat/speaker-diarization-pr

kimwwk commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kimwwk commented Feb 11, 2026

Summary

How it works

Endpoints

Configuration (environment variables)

New files

Backend changes

Related Issues

Testing

Technical Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant