Passing Whisper Bytes Instead of Filenames #1507

Codie-Petersen · 2023-07-07T19:52:39Z

Codie-Petersen
Jul 7, 2023

I’m developing a speech recognition subsystem using Whisper as the transcriber. I’m developing it on my local environment. I’ve created a Voice Activity Detection algorithm that picks up only voice and scrapes out clean voice data pretty easily. Only problem is I am saving to disk and then passing a file location to whisper.transcribe() which obviously loads it from disk then transcribes.

I have an NVMe and I’m still getting 0.5 second transcription times, but I’d obviously like something more performant than diskspace as memory. I know there is a way to feed the audio directly to the transcribe function, but it doesn’t seem to like the format I have given it. I am getting this error RuntimeError: “reflection_pad1d” not implemented for ‘Short’.

I was wondering what the shape of the data needs to be to pass it along.

For context, here the code that is saving the data:

def save_audio(audio_data, settings: AudioDataSettings):
    """Saves audio data to a file."""
    # Convert the float audio data back to 16-bit integers
    audio_data_int16 = (audio_data * 32767 * 10).astype(np.int16)
    audio_segment = pydub.AudioSegment(
        data=audio_data_int16.tobytes(),
        frame_rate=settings.bitrate,
        channels=settings.channels,
        sample_width=settings.sample_width,
    )

    # Save the audio segment to a file (because we can't pass it directly to the model)
    # and transcribe it.
    filename = f"{settings.temp_filename}.{settings.format}"
    audio_segment.export(filename, format=settings.format, bitrate=settings.bitrate_code)
    return filename

The idea originally was to use audio_data_int16 as the input into transcribe. But that didn't work, so I just stuck with saving to the disk.

Thanks in advance.

phineas-pta · 2023-07-08T18:44:51Z

phineas-pta
Jul 8, 2023

u should check whisper source file whisper/audio.py > function load_audio() to see what type of input array u need

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Passing Whisper Bytes Instead of Filenames #1507

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Passing Whisper Bytes Instead of Filenames #1507

Uh oh!

Codie-Petersen Jul 7, 2023

Replies: 1 comment

Uh oh!

phineas-pta Jul 8, 2023

Codie-Petersen
Jul 7, 2023

phineas-pta
Jul 8, 2023