feat: Add speaker diarization functionality to Qwen3-ForcedAligner#116
Open
YuYun329 wants to merge 2 commits intoQwenLM:mainfrom
Open
feat: Add speaker diarization functionality to Qwen3-ForcedAligner#116YuYun329 wants to merge 2 commits intoQwenLM:mainfrom
YuYun329 wants to merge 2 commits intoQwenLM:mainfrom
Conversation
…ize timestamp processing - Added speaker diarization functionality based on CampPlus model, supporting speaker clustering for audio segments - Optimized timestamp processing logic, added speaker label field - Improved audio segmentation processing, supporting text segmentation based on punctuation marks - Added model auto-download functionality, supporting downloading CampPlus model from ModelScope or HuggingFace
…orceAligner - Extract embedding extraction logic into _extract_embedding method - Extract speaker clustering logic into _cluster_speakers method - Extract label assignment logic into _assign_speaker_labels method - Add _find_speaker_for_item and _advance_segment helper methods - Improve code readability and maintainability
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR adds speaker diarization functionality to Qwen3-ForcedAligner, enabling speaker identification alongside timestamp prediction.
Key Changes
1. Speaker Diarization Support
speakerfield toForcedAlignItemdataclass to store speaker labelsClusterBackend2. New Components
cluster_backend.py: New module for speaker clustering with spectral clustering algorithm which refered to FunASR;3. Enhanced Features
Qwen3ForcedAlignernow accepts optionalcampplus_modelparameterload_wav()utility for audio preprocessing4. Files Modified
qwen_asr/inference/qwen3_forced_aligner.pyqwen_asr/inference/cluster_backend.pyqwen_asr/cli/demo.pyREADME.mdpyproject.tomlUsage Example
Dependencies
onnxruntime- For CAM++ model inferencescikit-learn- For spectral clusteringTesting
Tested with multi-speaker audio files, successfully identifying and labeling different speakers with accurate timestamps.
This feature enhances Qwen3-ASR's capabilities for applications like: