Transcribe from a Tensor is not working. #1291

ruliworst · 2023-04-27T18:46:16Z

ruliworst
Apr 27, 2023

Hello, I am trying to transcribe audio from a Tensor got using torchaudio library but it is not working. I am using Flask to load the audio given an endpoint. Any solution? Here is the code:

MODEL = whisper.load_model('base')
@app.route('/uploader', methods=['POST'])
def upload_audio():
    audio_file = request.files['audio']
    audio_file = io.BytesIO(audio_file.read())

    waveform, sr = torchaudio.load(audio_file)

    result = MODEL.transcribe(waveform)

    # print the recognized text
    return result["text"]

The error displayed is:
decode_options["language"] = max(probs, key=probs.get) AttributeError: 'list' object has no attribute 'get' in transcribe function.

Thanks in advance.

mitchsayre · 2023-04-27T22:42:33Z

mitchsayre
Apr 27, 2023

I think we are having the same issue. It seems to be the shape of the audio file tensor returned by torchaudio.load() is different from what whisper.transcribe() is expecting. I worked around it but I am not sure if there is a better solution. Here is my code:

file = open(audio_path, 'rb')
waveform, sample_rate = torchaudio.load(file)
waveform = waveform.squeeze()
result = model.transcribe(waveform)
print(result["text"])

tensor squeeze: https://pytorch.org/docs/stable/generated/torch.squeeze.html

1 reply

ruliworst Apr 30, 2023
Author

Hi, first of all thanks for your response.

I tried that solution but when transcribe method is run it gives a kind of array as text result:
3, 2, 1. 3, 4, 1. 3, 4, 1. 4, 4, 4, 5. 4, 5, 5. 4, 5, 5. 4, 5, 5. 4, 5, 6. 4, 5, 6. 4, 5. 4, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5. 5, 5.
So, it is not giving a proper response because it does not transcribe the audio to text.
Anyway, thanks for your response, again. I am still trying to find a solution.

RealHandy · 2024-09-10T20:47:57Z

RealHandy
Sep 10, 2024

If this is still relevant to anyone, I got this error with my use of the "large" model and got past it by specifying language = "en" in my call to model.transcribe(), i.e.
model.transcribe(audio = waveform, verbose = True, language = "en")

0 replies

KALEIDOSCOPEIP · 2025-01-17T06:00:40Z

KALEIDOSCOPEIP
Jan 17, 2025

Guys, if you are trying to load a waveform Tensor to whisper.transcribe via torchaudio.load, you might need to do the following:

Transform the sampling rate of your waveform to 16kHz.
Convert to mono audio.

I can show you a simple code sample:

import whisper
import torchaudio
import torchaudio.transforms as T

model = whisper.load_model("base", device=torch.device("cuda"))  # initialize whisper model
waveform_original, sample_rate_original = torchaudio.load("xxx.mp3")  # load audio with torchaudio

sample_rate_target = 16000
waveform_16khz = T.Resample(sample_rate_original, sample_rate_target)(waveform_original)  # sample rate to 16khz
waveform_16khz_mono = waveform_16khz.mean(dim=0, keep_dim=True)  # make mono audio

result = model.transcribe(waveform_16khz_mono.squeeze(0), language="en")  # squeeze the first dimension, and set the transcribing language to English

By doing the above, Tensor could be used for transcription. However, I am not sure if this could work for different languages since I arbitrarily set the transcribing language to English.

1 reply

KALEIDOSCOPEIP Jan 17, 2025

@RealHandy @ruliworst @mitchsayre You guys can give it a try to see if it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Transcribe from a Tensor is not working. #1291

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Transcribe from a Tensor is not working. #1291

Uh oh!

Uh oh!

ruliworst Apr 27, 2023

Replies: 3 comments · 2 replies

Uh oh!

Uh oh!

mitchsayre Apr 27, 2023

Uh oh!

ruliworst Apr 30, 2023 Author

Uh oh!

RealHandy Sep 10, 2024

Uh oh!

KALEIDOSCOPEIP Jan 17, 2025

Uh oh!

KALEIDOSCOPEIP Jan 17, 2025

ruliworst
Apr 27, 2023

Replies: 3 comments 2 replies

mitchsayre
Apr 27, 2023

ruliworst Apr 30, 2023
Author

RealHandy
Sep 10, 2024

KALEIDOSCOPEIP
Jan 17, 2025