You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+23-41Lines changed: 23 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,23 +1,19 @@
1
1
# VBHMM x-vectors Diarization (aka *VBx*)
2
2
3
-
Diarization recipe for The Second DIHARD Diarization Challenge https://coml.lscp.ens.fr/dihard/index.html\
4
-
by Brno University of Technology.
3
+
[Diarization recipe](https://speech.fit.vutbr.cz/software/vbhmm-x-vectors-diarization) for [The Second DIHARD Diarization Challenge](https://coml.lscp.ens.fr/dihard/index.html) by Brno University of Technology. \
5
4
The recipe consists of
6
5
- computing fbank features
7
6
- computing x-vectors
8
-
- doing Agglomerative Hierachical Clustering on x-vectors as a first step to produce an initialization
7
+
- doing Agglomerative Hierarchical Clustering on x-vectors as a first step to produce an initialization
9
8
- apply Variational Bayes HMM over x-vectors to produce the diarization output
10
9
- score the diarization output
11
10
12
11
More details about the full recipe in\
13
-
F. Landini, S. Wang, M. Diez, L. Burget et al.\
14
-
*BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020\
15
-
or \
16
-
*BUT System Description for DIHARD Speech Diarization Challenge 2019*, https://arxiv.org/abs/1910.08847
12
+
F. Landini, S. Wang, M. Diez, L. Burget et al.: *BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020\
13
+
or [*BUT System Description for DIHARD Speech Diarization Challenge 2019*](https://arxiv.org/abs/1910.08847)
17
14
18
15
A more thorough analysis of the diarization approach is presented in\
19
-
M. Diez, L. Burget, F. Landini, S. Wang, J. Černocký\
20
-
*Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
16
+
M. Diez, L. Burget, F. Landini, S. Wang, J. Černocký: *Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
21
17
22
18
23
19
@@ -38,7 +34,7 @@ soundfile >= 0.10.3
38
34
### Usage
39
35
To run the recipe, execute `run_recipe.sh` followed by `all` to run all steps or `features`, `xvectors`, `VBx`, `score` for only computing fbank features, computing xvectors, running VBx diarization or scoring, respectively.
40
36
41
-
The script is prepared to run on the development and evaluation sets of The Second DIHARD Diarization Challengehttps://coml.lscp.ens.fr/dihard/index.htmltrack 1. You need to provide the directory with the recordings in flac format and the directory for the speech activity detection labels as provided by the organizers:
37
+
The script is prepared to run on the development and evaluation sets of [The Second DIHARD Diarization Challenge](https://coml.lscp.ens.fr/dihard/index.html)[track 1](http://dihard.ldc.upenn.edu/competitions/73). You need to provide the directory with the recordings in flac format and the directory for the speech activity detection labels as provided by the organizers:
42
38
```
43
39
0.130 4.010 speech
44
40
4.790 5.750 speech
@@ -49,68 +45,54 @@ The script is prepared to run on the development and evaluation sets of The Seco
49
45
50
46
### Resources
51
47
This recipe makes use of an x-vector extractor model which was trained on data from the VoxCeleb corpora and using the Kaldi toolkit.\
52
-
A. Nagrani, J. S. Chung, A. Zisserman\
53
-
*VoxCeleb: a large-scale speaker identification dataset*\
54
-
J. S. Chung, A. Nagrani, A. Zisserman\
55
-
*VoxCeleb2: Deep Speaker Recognition*\
56
-
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al.\
57
-
*The Kaldi speech recognition toolkit*
48
+
A. Nagrani, J. S. Chung, A. Zisserman: *VoxCeleb: a large-scale speaker identification dataset*\
49
+
J. S. Chung, A. Nagrani, A. Zisserman: *VoxCeleb2: Deep Speaker Recognition*\
50
+
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz et al.: *The Kaldi speech recognition toolkit*
58
51
59
52
60
-
The x-vector extractor file has been compressed and separated into three files to be able to upload it. To recover it, first unsplit it:
53
+
The x-vector extractor file has been compressed and separated into two files to be able to upload it. To recover it, first unsplit it:
61
54
`
62
-
zip -s 0 splitted_xvector_extractor.txt.zip --out unsplit_xvector_extractor.txt.zip
55
+
zip -s 0 split_xvector_extractor.txt.zip --out unsplit_xvector_extractor.txt.zip
63
56
`
64
57
and then unzip it:
65
58
`
66
59
unzip unsplit_xvector_extractor.txt.zip
67
60
`
68
61
69
62
The recipe also uses two probabilistic linear discriminant analysis (PLDA) models, one trained on VoxCeleb data and another on the DIHARD development set. In case of using any of these PLDA models, also cite the corresponding publications.\
70
-
A. Nagrani, J. S. Chung, A. Zisserman\
71
-
*VoxCeleb: a large-scale speaker identification dataset*\
72
-
J. S. Chung, A. Nagrani, A. Zisserman\
73
-
*VoxCeleb2: Deep Speaker Recognition*\
74
-
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman\
63
+
A. Nagrani, J. S. Chung, A. Zisserman: *VoxCeleb: a large-scale speaker identification dataset*\
64
+
J. S. Chung, A. Nagrani, A. Zisserman: *VoxCeleb2: Deep Speaker Recognition*\
65
+
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman:
75
66
*The Second DIHARD Diarization Challenge: Dataset, task, and baselines*
76
67
77
68
78
69
### Citations
79
70
In case of using the software please cite:\
80
-
F. Landini, S. Wang, M. Diez, L. Burget et al.\
81
-
*BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020
71
+
F. Landini, S. Wang, M. Diez, L. Burget et al.: *BUT System for the Second DIHARD Speech Diarization Challenge*, ICASSP 2020
82
72
83
-
M. Diez, L. Burget, F. Landini, S. Wang, J. Cernocký\
84
-
*Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
73
+
M. Diez, L. Burget, F. Landini, S. Wang, J. Černocký: *Optimizing Bayesian HMM based x-vector clustering for the second DIHARD speech diarization challenge*, ICASSP 2020
85
74
86
-
A. Nagrani, J. S. Chung, A. Zisserman\
87
-
*VoxCeleb: a large-scale speaker identification dataset*
75
+
A. Nagrani, J. S. Chung, A. Zisserman: *VoxCeleb: a large-scale speaker identification dataset*
88
76
89
-
J. S. Chung, A. Nagrani, A. Zisserman\
90
-
*VoxCeleb2: Deep Speaker Recognition*
77
+
J. S. Chung, A. Nagrani, A. Zisserman: *VoxCeleb2: Deep Speaker Recognition*
91
78
92
-
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman\
93
-
*The Second DIHARD Diarization Challenge: Dataset, task, and baselines*
79
+
N. Ryant, K. Church, C. Cieri, A. Cristia, J. Du, S. Ganapathy, M. Liberman: *The Second DIHARD Diarization Challenge: Dataset, task, and baselines*
94
80
95
81
96
82
### Results
97
83
The diarization error rates (DER) obtained with this recipe for the development and evaluation are:\
98
84
Development 17.87\
99
85
Evaluation 18.31
100
86
101
-
In our submission to the challenge we used the weighted prediction error method (see papers below).\
102
-
Processing the recordings with this method and this recipe we obtained:\
87
+
In our submission to the challenge we used the weighted prediction error method (see papers below). Processing the recordings with this method and this recipe we obtained:\
103
88
Development 17.64\
104
89
Evaluation 18.09
105
90
106
-
All scores were obtained using the scoring tool provided by the organizers: https://github.com/nryant/dscore\
107
-
Due to some non-deterministic parts of the recipe, the obtained diarization outputs can slightly change from run to run.
91
+
All scores were obtained using the [scoring tool provided by the organizers](https://github.com/nryant/dscore). Due to some non-deterministic parts of the recipe, the obtained diarization outputs can slightly change from run to run.
108
92
109
-
*T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B. H. Juang\
110
-
*Speech dereverberation based on variance-normalized delayed linear prediction*\
93
+
T. Nakatani, T. Yoshioka, K. Kinoshita, M. Miyoshi, and B. H. Juang: *Speech dereverberation based on variance-normalized delayed linear prediction*\
111
94
and\
112
-
*L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach\
113
-
*NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing*
95
+
L. Drude, J. Heymann, C. Boeddeker, and R. Haeb-Umbach: *NARA-WPE: A Python package for weighted prediction error dereverberation in Numpy and Tensorflow for online and offline processing*
0 commit comments