feat: Add speaker diarization functionality to Qwen3-ForcedAligner by YuYun329 · Pull Request #116 · QwenLM/Qwen3-ASR

YuYun329 · 2026-03-07T11:46:18Z

Summary

This PR adds speaker diarization functionality to Qwen3-ForcedAligner, enabling speaker identification alongside timestamp prediction.

Key Changes

1. Speaker Diarization Support

Added speaker field to ForcedAlignItem dataclass to store speaker labels
Integrated CAM++ model for speaker embedding extraction
Implemented clustering-based speaker diarization using ClusterBackend

2. New Components

cluster_backend.py: New module for speaker clustering with spectral clustering algorithm which refered to FunASR;
CAM++ model integration: Automatic model download from ModelScope or HuggingFace

3. Enhanced Features

Qwen3ForcedAligner now accepts optional campplus_model parameter
Automatic model resolution from local path, ModelScope, or HuggingFace
load_wav() utility for audio preprocessing
Speaker embedding extraction and clustering pipeline

4. Files Modified

File	Changes
`qwen_asr/inference/qwen3_forced_aligner.py`	+226 lines - Core speaker diarization logic
`qwen_asr/inference/cluster_backend.py`	+192 lines - New clustering module
`qwen_asr/cli/demo.py`	+20 lines - Demo updates
`README.md`	Documentation updates
`pyproject.toml`	Dependency updates

Usage Example

python qwen_asr/cli/demo.py --asr-checkpoint Qwen/Qwen3-ASR-1.7B --aligner-checkpoint Qwen/Qwen3-ForcedAligner-0.6B --campplus-model FunAudioLLM/Fun-CosyVoice3-0.5B-2512/campplus.onnx

Dependencies

onnxruntime - For CAM++ model inference
scikit-learn - For spectral clustering

Testing

Tested with multi-speaker audio files, successfully identifying and labeling different speakers with accurate timestamps.

This feature enhances Qwen3-ASR's capabilities for applications like:

Meeting transcription with speaker attribution
Interview analysis
Podcast/Media content processing
Call center analytics

…ize timestamp processing - Added speaker diarization functionality based on CampPlus model, supporting speaker clustering for audio segments - Optimized timestamp processing logic, added speaker label field - Improved audio segmentation processing, supporting text segmentation based on punctuation marks - Added model auto-download functionality, supporting downloading CampPlus model from ModelScope or HuggingFace

…orceAligner - Extract embedding extraction logic into _extract_embedding method - Extract speaker clustering logic into _cluster_speakers method - Extract label assignment logic into _assign_speaker_labels method - Add _find_speaker_for_item and _advance_segment helper methods - Improve code readability and maintainability

root and others added 2 commits March 7, 2026 19:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add speaker diarization functionality to Qwen3-ForcedAligner#116

feat: Add speaker diarization functionality to Qwen3-ForcedAligner#116
YuYun329 wants to merge 2 commits intoQwenLM:mainfrom
YuYun329:feature/speaker_verify

YuYun329 commented Mar 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

YuYun329 commented Mar 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Key Changes

1. Speaker Diarization Support

2. New Components

3. Enhanced Features

4. Files Modified

Usage Example

Dependencies

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

YuYun329 commented Mar 7, 2026 •

edited

Loading