feat: add speaker diarization service using pyannote.audio#349
Open
kimwwk wants to merge 1 commit intoZackriya-Solutions:mainfrom
Open
feat: add speaker diarization service using pyannote.audio#349kimwwk wants to merge 1 commit intoZackriya-Solutions:mainfrom
kimwwk wants to merge 1 commit intoZackriya-Solutions:mainfrom
Conversation
Add a new diarization service that wraps whisper.cpp transcription with speaker identification using pyannote.audio. The service runs alongside the existing whisper server and enriches transcription output with speaker labels. Key features: - Speaker diarization via pyannote/speaker-diarization-3.1 - Cross-chunk speaker tracking using voice embeddings for consistent speaker IDs throughout a meeting session - Compatible /inference endpoint that extends whisper.cpp API - Session management endpoints for speaker data lifecycle - Standalone Dockerfile for containerized deployment - Speaker field added to backend transcript storage and API Addresses Zackriya-Solutions#230, Zackriya-Solutions#226, Zackriya-Solutions#56 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
backend/diarization_service/that wraps whisper.cpp transcription with speaker identification using pyannote.audiospeakerfield to backend transcript storage (db.py) and API model (main.py)How it works
The diarization service runs as a sidecar to the existing whisper.cpp server:
Endpoints
/inference/transcribe/health/session/{id}/speakers/session/{id}Configuration (environment variables)
WHISPER_SERVER_URLhttp://localhost:8178HF_AUTH_TOKENDIARIZATION_PIPELINEpyannote/speaker-diarization-3.1SPEAKER_EMBEDDING_DIRNew files
Backend changes
db.py: Addedspeaker TEXTcolumn to transcripts table (with migration)main.py: Addedspeakerfield toTranscriptmodel and save flowRelated Issues
Addresses #230, #226, #56
Testing
/inferenceendpoint returns segments with speaker labelsUNKNOWNspeaker)speakercolumn without breaking existing dataTechnical Notes
torch<2.6pin is required because PyTorch 2.6+ changedweights_only=Truedefault which breaks pyannote model loadinghuggingface_hub<0.23pin is required because 0.23+ deprecated theuse_auth_tokenparameter used by pyannote🤖 Generated with Claude Code