Replies: 1 comment 3 replies
-
The values are actually speech and non-speech probabilities.
|
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi. Thank you very much for developing very nice SAD tool.
Let me ask a question about the SAD data-structure of pyannote-audio.
In my understanding, the dimension of speech activity detection scores is 1, as showing the following graph:
https://raw.githubusercontent.com/pyannote/pyannote-audio/master/tutorials/pretrained/model/segmentation.png
However, in the following code (like as the tutorial page https://github.com/pyannote/pyannote-audio/tree/master/tutorials/pretrained/model),
sad_scores = sad(test_file)
I noticed that the data structure of sad_scores has 2-dimensional array as pyannote.core.segment.SlidingWindow object.
I think one of the two dimensions must be for the values of sad_scores, but what is the values in the remaining another dimension?
Are these the output and h_t (score) of LSTM?
Also, what is the length of the scores compared to the wave length?
Thanks for any answer.
Beta Was this translation helpful? Give feedback.
All reactions