You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm trying to use OpenAI's open source Whisper library to finetune a Speech-to-text model i.e. the input audio will be in one language and the output text in another language. So after loading the model i am detaching the encoder and trying to feed the input audio into the encoder, but i am getting an Assertion error AssertionError: incorrect audio shape.
This is what i am currently trying
import whisper
model = whisper.load_model("base")
mod = model.encoder ### Getting the encoder
audio = whisper.load_audio('audio_Path/test.wav') ### loading the audio
mel = whisper.log_mel_spectrogram(audio).cuda()
mel = mel.unsqueeze(0)
print(mod(mel))
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm trying to use OpenAI's open source Whisper library to finetune a Speech-to-text model i.e. the input audio will be in one language and the output text in another language. So after loading the model i am detaching the encoder and trying to feed the input audio into the encoder, but i am getting an Assertion error
AssertionError: incorrect audio shape
.This is what i am currently trying
Beta Was this translation helpful? Give feedback.
All reactions