Finetune the whisper model by using a custom decoder #1655

aquorio15 · 2023-09-13T09:22:50Z

aquorio15
Sep 13, 2023

I'm trying to use OpenAI's open source Whisper library to finetune a Speech-to-text model i.e. the input audio will be in one language and the output text in another language. So after loading the model i am detaching the encoder and trying to feed the input audio into the encoder, but i am getting an Assertion error AssertionError: incorrect audio shape.
This is what i am currently trying

import whisper
model = whisper.load_model("base")  
mod = model.encoder ### Getting the encoder
audio = whisper.load_audio('audio_Path/test.wav') ### loading the audio
mel = whisper.log_mel_spectrogram(audio).cuda()
mel = mel.unsqueeze(0)
print(mod(mel))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Finetune the whisper model by using a custom decoder #1655

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Finetune the whisper model by using a custom decoder #1655

Uh oh!

Uh oh!

aquorio15 Sep 13, 2023

Replies: 0 comments

aquorio15
Sep 13, 2023