Skip to content

Commit ca03ded

Browse files
author
Landini Federico Nicolas
committed
Update links in README
1 parent 2e51a3a commit ca03ded

File tree

1 file changed

+23
-41
lines changed

1 file changed

+23
-41
lines changed

README.md

Lines changed: 23 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,19 @@
11
# VBHMM x-vectors Diarization (aka *VBx*)
22

3-
Diarization recipe for The Second DIHARD Diarization Challenge https://coml.lscp.ens.fr/dihard/index.html \
4-
by Brno University of Technology.
3+
[Diarization recipe](https://speech.fit.vutbr.cz/software/vbhmm-x-vectors-diarization) for [The Second DIHARD Diarization Challenge](https://coml.lscp.ens.fr/dihard/index.html) by Brno University of Technology. \
54
The recipe consists of
65
- computing fbank features
76
- computing x-vectors
8-
- doing Agglomerative Hierachical Clustering on x-vectors as a first step to produce an initialization
7+
- doing Agglomerative Hierarchical Clustering on x-vectors as a first step to produce an initialization
98
- apply Variational Bayes HMM over x-vectors to produce the diarization output
109
- score the diarization output
1110

1211
More details about the full recipe in\
13-
F. Landini, S. Wang, M. Diez, L. Burget et al.\
14-
*BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020\
15-
or \
16-
*BUT System Description for DIHARD Speech Diarization Challenge 2019*, https://arxiv.org/abs/1910.08847
12+
F. Landini, S. Wang, M. Diez, L. Burget et al.: *BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020\
13+
or [*BUT System Description for DIHARD Speech Diarization Challenge 2019*](https://arxiv.org/abs/1910.08847)
1714

1815
A more thorough analysis of the diarization approach is presented in\
19-
M. Diez, L. Burget, F. Landini, S. Wang, J. Černocký\
20-
*Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
16+
M. Diez, L. Burget, F. Landini, S. Wang, J. Černocký: *Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
2117

2218

2319

@@ -38,7 +34,7 @@ soundfile >= 0.10.3
3834
### Usage
3935
To run the recipe, execute `run_recipe.sh` followed by `all` to run all steps or `features`, `xvectors`, `VBx`, `score` for only computing fbank features, computing xvectors, running VBx diarization or scoring, respectively.
4036

41-
The script is prepared to run on the development and evaluation sets of The Second DIHARD Diarization Challenge https://coml.lscp.ens.fr/dihard/index.html track 1. You need to provide the directory with the recordings in flac format and the directory for the speech activity detection labels as provided by the organizers:
37+
The script is prepared to run on the development and evaluation sets of [The Second DIHARD Diarization Challenge](https://coml.lscp.ens.fr/dihard/index.html) [track 1](http://dihard.ldc.upenn.edu/competitions/73). You need to provide the directory with the recordings in flac format and the directory for the speech activity detection labels as provided by the organizers:
4238
```
4339
0.130 4.010 speech
4440
4.790 5.750 speech
@@ -49,68 +45,54 @@ The script is prepared to run on the development and evaluation sets of The Seco
4945

5046
### Resources
5147
This recipe makes use of an x-vector extractor model which was trained on data from the VoxCeleb corpora and using the Kaldi toolkit.\
52-
A. Nagrani, J. S. Chung, A. Zisserman\
53-
*VoxCeleb: a large-scale speaker identification dataset*\
54-
J. S. Chung, A. Nagrani, A. Zisserman\
55-
*VoxCeleb2: Deep Speaker Recognition*\
56-
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al.\
57-
*The Kaldi speech recognition toolkit*
48+
A. Nagrani, J. S. Chung, A. Zisserman: *VoxCeleb: a large-scale speaker identification dataset*\
49+
J. S. Chung, A. Nagrani, A. Zisserman: *VoxCeleb2: Deep Speaker Recognition*\
50+
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al.: *The Kaldi speech recognition toolkit*
5851

5952

60-
The x-vector extractor file has been compressed and separated into three files to be able to upload it. To recover it, first unsplit it:
53+
The x-vector extractor file has been compressed and separated into two files to be able to upload it. To recover it, first unsplit it:
6154
`
62-
zip -s 0 splitted_xvector_extractor.txt.zip --out unsplit_xvector_extractor.txt.zip
55+
zip -s 0 split_xvector_extractor.txt.zip --out unsplit_xvector_extractor.txt.zip
6356
`
6457
and then unzip it:
6558
`
6659
unzip unsplit_xvector_extractor.txt.zip
6760
`
6861

6962
The recipe also uses two probabilistic linear discriminant analysis (PLDA) models, one trained on VoxCeleb data and another on the DIHARD development set. In case of using any of these PLDA models, also cite the corresponding publications.\
70-
A. Nagrani, J. S. Chung, A. Zisserman\
71-
*VoxCeleb: a large-scale speaker identification dataset*\
72-
J. S. Chung, A. Nagrani, A. Zisserman\
73-
*VoxCeleb2: Deep Speaker Recognition*\
74-
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman\
63+
A. Nagrani, J. S. Chung, A. Zisserman: *VoxCeleb: a large-scale speaker identification dataset*\
64+
J. S. Chung, A. Nagrani, A. Zisserman: *VoxCeleb2: Deep Speaker Recognition*\
65+
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman:
7566
*The Second DIHARD Diarization Challenge: Dataset, task, and baselines*
7667

7768

7869
### Citations
7970
In case of using the software please cite:\
80-
F. Landini, S. Wang, M. Diez, L. Burget et al.\
81-
*BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020
71+
F. Landini, S. Wang, M. Diez, L. Burget et al.: *BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020
8272

83-
M. Diez, L. Burget, F. Landini, S. Wang, J. Cernocký\
84-
*Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
73+
M. Diez, L. Burget, F. Landini, S. Wang, J. Černocký: *Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
8574

86-
A. Nagrani, J. S. Chung, A. Zisserman\
87-
*VoxCeleb: a large-scale speaker identification dataset*
75+
A. Nagrani, J. S. Chung, A. Zisserman: *VoxCeleb: a large-scale speaker identification dataset*
8876

89-
J. S. Chung, A. Nagrani, A. Zisserman\
90-
*VoxCeleb2: Deep Speaker Recognition*
77+
J. S. Chung, A. Nagrani, A. Zisserman: *VoxCeleb2: Deep Speaker Recognition*
9178

92-
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman\
93-
*The Second DIHARD Diarization Challenge: Dataset, task, and baselines*
79+
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman: *The Second DIHARD Diarization Challenge: Dataset, task, and baselines*
9480

9581

9682
### Results
9783
The diarization error rates (DER) obtained with this recipe for the development and evaluation are:\
9884
Development 17.87\
9985
Evaluation 18.31
10086

101-
In our submission to the challenge we used the weighted prediction error method (see papers below).\
102-
Processing the recordings with this method and this recipe we obtained:\
87+
In our submission to the challenge we used the weighted prediction error method (see papers below). Processing the recordings with this method and this recipe we obtained:\
10388
Development 17.64\
10489
Evaluation 18.09
10590

106-
All scores were obtained using the scoring tool provided by the organizers: https://github.com/nryant/dscore\
107-
Due to some non-deterministic parts of the recipe, the obtained diarization outputs can slightly change from run to run.
91+
All scores were obtained using the [scoring tool provided by the organizers](https://github.com/nryant/dscore). Due to some non-deterministic parts of the recipe, the obtained diarization outputs can slightly change from run to run.
10892

109-
*T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B. H. Juang\
110-
*Speech dereverberation based on variance-normalized delayed linear prediction*\
93+
T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B. H. Juang: *Speech dereverberation based on variance-normalized delayed linear prediction*\
11194
and\
112-
*L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach\
113-
*NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing*
95+
L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach: *NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing*
11496

11597

11698

0 commit comments

Comments
 (0)