Skip to content
Discussion options

You must be logged in to vote

I presume you mean audio features rather than logits since you're using only the encoder. The encoder always takes 30-second-long audio as input, and we trim or pad the audio to match this length. The encoded features are also 30-seconds long as a result. You can slice the features if the input was shorter than 30 seconds.

More code-specific questions could be better answered by speechbrain's maintainers.

Replies: 3 comments 3 replies

Comment options

You must be logged in to vote
3 replies
@pegahsalehi
Comment options

@LiuRicky
Comment options

@LiuRicky
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants