-
Notifications
You must be signed in to change notification settings - Fork 17
Description
Hello there,
Thanks for your efforts in open-sourcing the code, it's vital for us trying to reproduce the result presented in the paper.
Problem
But I've come across a RuntimeError when adapting the model with our private data which shows:
/*/EEND-vector-clustering/eend/pytorch_backend/train.py:186: RuntimeWarning: invalid value encountered in true_divide
fet_arr[spk] = org / norm
...
Traceback (most recent call last):
...
RuntimeError: The loss (nan) is not finite.Detail
After some debugging, I found the problem actually happens during the backpropagation step when there exists an entry left with zeros in the embedding layer:
EEND-vector-clustering/eend/pytorch_backend/train.py
Lines 173 to 186 in b3649ee
| fet_arr = np.zeros([spk_num, fet_dim]) | |
| # sum | |
| bs = spklabs.shape[0] | |
| for i in range(bs): | |
| if spkidx_tbl[spklabs[i]] == -1: | |
| raise ValueError(spklabs[i]) | |
| fet_arr[spkidx_tbl[spklabs[i]]] += spkvecs[i] | |
| # normalize | |
| for spk in range(spk_num): | |
| org = fet_arr[spk] | |
| norm = np.linalg.norm(org, ord=2) | |
| fet_arr[spk] = org / norm |
Since the embeddings are actually loaded from the dumped speaker embeddings generated by the save_spkv_lab.py script when adapting the model, I suspect there might exist some issue in the save_spkv_lab function.
After some careful step-by-step checking with pdb, I found there is actually some silent speaker label added in the all_labels variable when dumping the speaker embeddings:
EEND-vector-clustering/eend/pytorch_backend/infer.py
Lines 349 to 355 in b3649ee
| for i in range(args.num_speakers): | |
| # Exclude samples corresponding to silent speaker | |
| if torch.sum(t_chunked_t[sigma[i]]) > 0: | |
| vec = outputs[i+1][0].cpu().detach().numpy() | |
| lab = chunk_data[2][sigma[i]] | |
| all_outputs.append(vec) | |
| all_labels.append(lab) |
Even when if torch.sum(t_chunked_t[sigma[i]]) > 0, lab can still be -1 which is considered as silent speaker acroding to code in:
EEND-vector-clustering/eend/pytorch_backend/diarization_dataset.py
Lines 94 to 99 in b3649ee
| S_arr = -1 * np.ones(n_speakers).astype(np.int64) | |
| for seg in filtered_segments: | |
| speaker_index = speakers.index(self.data.utt2spk[seg['utt']]) | |
| all_speaker_index = self.all_speakers.index( | |
| self.data.utt2spk[seg['utt']]) | |
| S_arr[speaker_index] = all_speaker_index |
Since these silent speaker labels are -1 and the python list support negative indexing, this issue is silently ignored when dumping the embedding but will cause Exceptions when training begins.
Question
I could simply fix this issue by adding speaker label to all_labels only if lab < 0 when saving speaker embeddings and the followed training process could continue smoothly resulting in a good performing model.
But before opening any PR, I would like to know if you guys have ever come across such an issue or do you have any idea on why this will happen.
Thanks!