Are there any examples on how to use the Prefix? #1329

rokaHakor · 2023-05-08T18:00:01Z

rokaHakor
May 8, 2023

I want to do real-time audio transcription/translation and have tested doing so with 5-10 second chunks. However the quality is much worse doing it in chunks compared to recording the full audio then running it through whisper. Even using the prompt with the previously transcribed text was still noticeably worse.

It seems like using the Prefix would help with this however it does not seem to be working properly or I am using it incorrectly.
I am transcribing the first 5 second chunk then using that transcription as the prefix for the next input while adding the new 5 second chunk to the previous chunk for a 10 second input, and so on until it hits the 30 second limit. However doing that only results in empty output or repetitions of the initial 5 second transcription with nothing new.

Are there any examples on how to use the Prefix option properly?

phineas-pta · 2023-05-09T09:35:13Z

phineas-pta
May 9, 2023

i don't know anything about prefix but based on README i think it'd be like:

import whisper
model = whisper.load_model("base")
audio = whisper.load_audio("audio.mp3")
mel = whisper.log_mel_spectrogram(audio).to(model.device)
options = whisper.DecodingOptions(prefix="<previous chunk transcription>")
result = whisper.decode(model, mel, options)
print(result.text)

the problem is how to feed previous chunk into prefix, a loop maybe ?

5 replies

rokaHakor Jul 28, 2023
Author

I'm trying this in a loop and for whatever reason it only decodes once or rather it only decodes once properly. I've got a 30 second sound bite that I've divided up so the first one is the first 5 seconds, the second is the first 10 seconds, the third is the first 15 seconds, etc. However it only ends up translating once or twice.

def main(model="medium"):
    print("Loading model...")
    model = whisper.load_model(model, device="cuda")

    text = ""

    for i in range(1,7):
        print(i)
        audio = whisper.load_audio("asoundbit{}.wav".format(i))
        audio2 = whisper.pad_or_trim(audio)
        mel = whisper.log_mel_spectrogram(audio2, device=model.device)

        options = whisper.DecodingOptions(task="translate", prefix=text.lstrip())
        result = whisper.decode(model, mel, options)
        
        decoded_text = result.text

        if decoded_text:
            text = decoded_text
            print(f'{datetime.now().strftime("%H:%M:%S")} {decoded_text}')


if __name__ == "__main__":
    main()
    pass

phineas-pta Jul 28, 2023

maybe try specify language

rokaHakor Jul 29, 2023
Author

Even setting the language it still has the same issue.

phineas-pta Jul 29, 2023

maybe problem with translate ? did u try transcribe only

rokaHakor Jul 30, 2023
Author

It had the same issue with transcribe

rokaHakor · 2023-07-28T14:27:09Z

rokaHakor
Jul 28, 2023
Author

If I only process the full 30 second sound bite then it translates the entire thing just fine.

0 replies

casic · 2023-07-29T08:06:01Z

casic
Jul 29, 2023

The whisper frame decoding window is 30 sec by default 1, Try with 2 , 3 minutes and see result 2. why audio = whisper.load_audio("asoundbit{}.wav".format(i)) is in the loop ??? На сб, 29.07.2023 г. в 7:43 ч. Irmuun ***@***.***> написа:

…

Even setting the language it still has the same issue. — Reply to this email directly, view it on GitHub <#1329 (reply in thread)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAYTSUMLOJFP2SWADFXPQRDXSSIGPANCNFSM6AAAAAAX2IGY2U> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

1 reply

rokaHakor Jul 30, 2023
Author

I'm trying to use the prefix to real-time translate audio while outputting the transcription/translation in ~5 second chunks.
It should be possible based on this
#117

phineas-pta · 2023-07-30T07:27:51Z

phineas-pta
Jul 30, 2023

it seem we dont need low level DecodingOptions

import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3", task="transcribe", prefix="<previous chunk transcription>")
print(result["text"])

0 replies

Are there any examples on how to use the Prefix? #1329

Uh oh!

rokaHakor May 8, 2023

Replies: 4 comments · 6 replies

Uh oh!

phineas-pta May 9, 2023

Uh oh!

rokaHakor Jul 28, 2023 Author

Uh oh!

phineas-pta Jul 28, 2023

Uh oh!

rokaHakor Jul 29, 2023 Author

Uh oh!

phineas-pta Jul 29, 2023

Uh oh!

rokaHakor Jul 30, 2023 Author

Uh oh!

rokaHakor Jul 28, 2023 Author

Uh oh!

casic Jul 29, 2023

Uh oh!

rokaHakor Jul 30, 2023 Author

Uh oh!

phineas-pta Jul 30, 2023

rokaHakor
May 8, 2023

Replies: 4 comments 6 replies

phineas-pta
May 9, 2023

rokaHakor Jul 28, 2023
Author

rokaHakor Jul 29, 2023
Author

rokaHakor Jul 30, 2023
Author

rokaHakor
Jul 28, 2023
Author

casic
Jul 29, 2023

rokaHakor Jul 30, 2023
Author

phineas-pta
Jul 30, 2023