How can i used pydub output as Whisper audio input ? #983

MarriamSiddiqui · 2023-02-20T05:56:37Z

MarriamSiddiqui
Feb 20, 2023

I am obtaining the audio from the following function. I then need to use the Whisper model to transcribe the audio. How can I use the output of the following function into the whisper model?
Note: I cannot save the audio file (even temporarily) due to latency issue.

def extract_audio_buffer(data_arg):
sequence_id['audio_id'] += 1
# Create a buffer to hold the audio data
audio_buffer = io.BytesIO()

main_tuple = json.loads(data_arg)
audio_data = main_tuple['Item1']['Data']
print(main_tuple)

# audio_buffer.write(data)
audio_data = base64.b64decode(audio_data)
# audio_data = audio_data.encode()
audio_buffer.write(audio_data)

# Convert the audio data to an MP3 file and save it to disk
audio_buffer.seek(0)

frame_rate = main_tuple['Item1']['Format']['SamplesPerSec'] / main_tuple['Item1']['Format']['BlockAlign']
channels = main_tuple['Item1']['Format']['Channels']
bytes_per_sample = main_tuple['Item1']['Format']['BitsPerSample'] // 8
sample_width = main_tuple['Item1']['Format']['BlockAlign'] * bytes_per_sample

# try to save the audio
audio_segment = AudioSegment.from_file(audio_buffer, format="raw", sample_width=sample_width,
                                       channels=channels, frame_rate=frame_rate)

### I need to use whisper somewhere here or wherever possible (without saving it to drive) ###

# return audio_segment
# audio_filename = f"{str(sequence_id['audio_id'])}.mp3"
# audio_path = os.path.join("bot_audio", audio_filename)
# audio_segment.export(audio_path, format="mp3")

glangford · 2023-02-20T17:50:39Z

glangford
Feb 20, 2023

This thread may be helpful

Using ndarray as input to transcribe method #380

0 replies

maltoze · 2023-02-21T01:40:11Z

maltoze
Feb 21, 2023

import whisper
import numpy as np

from pydub import AudioSegment

model = whisper.load_model("base")

audio = AudioSegment.from_file("audio.wav")
result = model.transcribe(np.frombuffer(audio.raw_data, np.int16).flatten().astype(np.float32) / 32768.0)

print(result["text"])

1 reply

metal3d Jun 26, 2024

It doesn't work for me - if I save the segment and use it with model.transcribe it works, but using the buffer gives weird transcription ("......ooooooooooo" and so on)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How can i used pydub output as Whisper audio input ? #983

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How can i used pydub output as Whisper audio input ? #983

Uh oh!

MarriamSiddiqui Feb 20, 2023

Replies: 2 comments · 1 reply

Uh oh!

glangford Feb 20, 2023

Uh oh!

maltoze Feb 21, 2023

Uh oh!

metal3d Jun 26, 2024

MarriamSiddiqui
Feb 20, 2023

Replies: 2 comments 1 reply

glangford
Feb 20, 2023

maltoze
Feb 21, 2023