Training on long audio #1544

chlorane · 2023-07-23T08:57:36Z

chlorane
Jul 23, 2023

I have tried Whisper to carry ASR task. It performs well.
However, in our case, we want to train and test our model on long audios (about 70sec). Our data are given directly using 40d fbank features and we tried to use some fc layers to convert them to fit the 80d input. Unfortunately, now whisper only seems to allow a 30s audio input, making much information be cut off before being conveyed to the network.
For this case, is there any solution?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Training on long audio #1544

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Training on long audio #1544

Uh oh!

chlorane Jul 23, 2023

Replies: 0 comments

chlorane
Jul 23, 2023