What is the data of sad_scores? #629

hatah · 2021-03-03T11:54:38Z

hatah
Mar 3, 2021

Hi. Thank you very much for developing very nice SAD tool.

Let me ask a question about the SAD data-structure of pyannote-audio.
In my understanding, the dimension of speech activity detection scores is 1, as showing the following graph:
https://raw.githubusercontent.com/pyannote/pyannote-audio/master/tutorials/pretrained/model/segmentation.png

However, in the following code (like as the tutorial page https://github.com/pyannote/pyannote-audio/tree/master/tutorials/pretrained/model),

sad_scores = sad(test_file)

I noticed that the data structure of sad_scores has 2-dimensional array as pyannote.core.segment.SlidingWindow object.
I think one of the two dimensions must be for the values of sad_scores, but what is the values in the remaining another dimension?
Are these the output and h_t (score) of LSTM?
Also, what is the length of the scores compared to the wave length?

Thanks for any answer.

hbredin · 2021-03-04T15:59:37Z

hbredin
Mar 4, 2021
Maintainer

The values are actually speech and non-speech probabilities.
They should sum to one.

pyannote.audio 2.0 will change that and only output the speech probability.
You can already try a 2.0 VAD model here: https://huggingface.co/hbredin/VoiceActivityDetection-PyanNet-DIHARD

3 replies

hatah Mar 6, 2021
Author

Thank you very much for your reply. I noticed it later.

I also noticed that the SincNet's pooling is [3, 3, 3], so I think that it is the reason why the output length becomes 1/27 of the input's one.

I will check out the 2.0. Thanks!

hbredin Mar 8, 2021
Maintainer

Yes, that and the kernel size.

sad_scores has a sliding_window attribute that describes the output temporal resolution:

sliding_window.duration: frame duration
sliding_window.step: frame step

hatah Mar 11, 2021
Author

Thank you very much for your reply!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

What is the data of sad_scores? #629

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

What is the data of sad_scores? #629

Uh oh!

Uh oh!

hatah Mar 3, 2021

Replies: 1 comment · 3 replies

Uh oh!

hbredin Mar 4, 2021 Maintainer

Uh oh!

Uh oh!

hatah Mar 6, 2021 Author

Uh oh!

hbredin Mar 8, 2021 Maintainer

Uh oh!

hatah Mar 11, 2021 Author

hatah
Mar 3, 2021

Replies: 1 comment 3 replies

hbredin
Mar 4, 2021
Maintainer

hatah Mar 6, 2021
Author

hbredin Mar 8, 2021
Maintainer

hatah Mar 11, 2021
Author