FS-EEND & LS-EEND

The official Pytorch implementation of:

[1] "Frame-wise streaming end-to-end speaker diarization with non-autoregressive self-attention-based attractors" [accepted by ICASSP 2024].

[2] "LS-EEND: Long-Form Streaming End-to-End Neural Diarization with Online Attractor Extraction." [accepted by IEEE Trans. ASLPRO 2025].

Paper 🤩 | Issues 😅 | Lab 🙉 | Contact 😘

Introduction

This work proposes a frame-wise online/streaming end-to-end neural diarization (EEND) method, which detects speaker activities in a frame-in-frame-out fashion. The proposed model mainly consists of a causal embedding encoder and an online attractor decoder. Speakers are modeled in the self-attention-based decoder along both the time and speaker dimensions, and frame-wise speaker attractors are automatically generated and updated for new speakers and existing speakers, respectively. Retention mechanism is employed and especially adapted for long-form diarization with a linear temporal complexity. A multi-step progressive training strategy is proposed for gradually learning from easy tasks to hard tasks in terms of the number of speakers and audio length. Finally, the proposed model (referred to as long-form streaming EEND, LS-EEND) is able to perform streaming diarization for a high (up to 8) and flexible number speakers and very long (say one hour) audio recordings.

Get started

Clone the FS-EEND codes by:

git clone https://github.com/Audio-WestlakeU/FS-EEND.git

Prepare kaldi-style data by referring to here. Modify conf/xxx.yaml according to your own paths.
Start training on simulated data by

python train_dia.py --configs conf/spk_onl_tfm_enc_dec_nonautoreg.yaml --gpus YOUR_DEVICE_ID,

Modify your pretrained model path in conf/spk_onl_tfm_enc_dec_nonautoreg_callhome.yaml.
Finetune on CALLHOME data by

python train_dia_fintn_ch.py --configs conf/spk_onl_tfm_enc_dec_nonautoreg_callhome.yaml --gpus YOUR_DEVICE_ID,

Inference by (# modify your own path to save predictions in test_step in train/oln_tfm_enc_decxxx.py. train_dia_simu.py for inferring simulated data and train_dia_fintun_real.py for inferring real-word data)

python train_dia_simu.py --configs conf/xxx_infer.yaml --gpus YOUR_DEVICE_ID, --test_from_folder YOUR_CKPT_SAVE_DIR
python train_dia_fintun_real.py --configs conf/xxx_infer.yaml --gpus YOUR_DEVICE_ID, --test_from_folder YOUR_CKPT_SAVE_DIR

Evaluation

# generate speech activity probability (diarization results)
cd visualize
python gen_h5_output.py

#calculate DERs (mid filter and collar)
python metrics.py --configs conf/xxx_infer.yaml

Performance

Simulated Dataset	Simu1spk	Simu2spk	Simu3spk	Simu4spk	Simu5spk	Simu6spk	Simu7spk	Simu8spk
DERS (%)	0.34	2.84	6.25	8.34	11.26	15.36	19.53	23.35
ckpt	simu_1-8spk.ckpt	same	same	same	same	same	same	same

Real-world Dataset	CALLHOME	DIHARD II	DIHARD III	AMI Dev	AMI Eval
DERS (%)	12.11	27.58	19.61	20.97	20.76
ckpt	ch.ckpt	dih2.ckpt	dih3.ckpt	ami.ckpt	ami.ckpt

(All datasets are sampled at 8kHZ)

Reference code

Citation

If you want to cite this paper:

@INPROCEEDINGS{10446568,
  author={Liang, Di and Shao, Nian and Li, Xiaofei},
  booktitle={ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Frame-Wise Streaming end-to-end Speaker Diarization with Non-Autoregressive Self-Attention-Based Attractors}, 
  year={2024},
  pages={10521-10525}}

@ARTICLE{11122273,
  author={Liang, Di and Li, Xiaofei},
  journal={IEEE Transactions on Audio, Speech and Language Processing}, 
  title={LS-EEND: Long-Form Streaming End-to-End Neural Diarization With Online Attractor Extraction}, 
  year={2025},
  volume={33},
  number={},
  pages={3568-3581},
  doi={10.1109/TASLPRO.2025.3597446}}

Name		Name	Last commit message	Last commit date
Latest commit History 197 Commits
FS-EEND		FS-EEND
LS-EEND		LS-EEND
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FS-EEND & LS-EEND

Introduction

Get started

Performance

Reference code

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Audio-WestlakeU/FS-EEND

Folders and files

Latest commit

History

Repository files navigation

FS-EEND & LS-EEND

Introduction

Get started

Performance

Reference code

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages