Pyannote.audio toolkit with MFCC #1680

sumansamui · 2024-04-03T05:41:09Z

sumansamui
Apr 3, 2024

I have a few doubts:

How to use pyannote.audio setup if we want to extract MFCC, i.e., using the speech segmentation model with the MFCC feature. Is there any pre-trained model available for those settings? Or we have to train from scratch.
What is the impact of sample frequency on Sincnet? I know all the input audio is downsampled or upsampled to 16k.

We observed that Pyannote provides the same result for 8k and 16k versions of a WAV file in the case of SincNet architecture. Is it because of the same number of Sinc filters in the low-frequency range for both 8k and 16k.

hbredin · 2024-04-03T07:01:17Z

hbredin
Apr 3, 2024
Maintainer

There is no pretrained model available relying on MFCC. You would have to train them from scratch
I have only ever trained 16kHz models, so I do not really have any intuitions. I guess if you want to train 8kHz models, you would would have to slightly change the kernel size and stride of the first convolutional layer of Sincnet.

1 reply

sumansamui Apr 4, 2024
Author

Thank you Herve, for the clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Pyannote.audio toolkit with MFCC #1680

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Pyannote.audio toolkit with MFCC #1680

Uh oh!

sumansamui Apr 3, 2024

Replies: 1 comment · 1 reply

Uh oh!

hbredin Apr 3, 2024 Maintainer

Uh oh!

sumansamui Apr 4, 2024 Author

sumansamui
Apr 3, 2024

Replies: 1 comment 1 reply

hbredin
Apr 3, 2024
Maintainer

sumansamui Apr 4, 2024
Author