Audio Feature Extraction using Whisper #1246
-
Hi,
how can I extract the logits from audio in the shape |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 3 replies
-
I presume you mean audio features rather than logits since you're using only the encoder. The encoder always takes 30-second-long audio as input, and we trim or pad the audio to match this length. The encoded features are also 30-seconds long as a result. You can slice the features if the input was shorter than 30 seconds. More code-specific questions could be better answered by speechbrain's maintainers. |
Beta Was this translation helpful? Give feedback.
-
@LiuRicky, this is exactly my problem. Did you find any ideas? |
Beta Was this translation helpful? Give feedback.
-
I am also facing the same problem |
Beta Was this translation helpful? Give feedback.
I presume you mean audio features rather than logits since you're using only the encoder. The encoder always takes 30-second-long audio as input, and we trim or pad the audio to match this length. The encoded features are also 30-seconds long as a result. You can slice the features if the input was shorter than 30 seconds.
More code-specific questions could be better answered by speechbrain's maintainers.