Replies: 4 comments 6 replies
-
i don't know anything about import whisper
model = whisper.load_model("base")
audio = whisper.load_audio("audio.mp3")
mel = whisper.log_mel_spectrogram(audio).to(model.device)
options = whisper.DecodingOptions(prefix="<previous chunk transcription>")
result = whisper.decode(model, mel, options)
print(result.text) the problem is how to feed previous chunk into prefix, a loop maybe ? |
Beta Was this translation helpful? Give feedback.
5 replies
-
If I only process the full 30 second sound bite then it translates the entire thing just fine. |
Beta Was this translation helpful? Give feedback.
0 replies
-
The whisper frame decoding window is 30 sec by default
1, Try with 2 , 3 minutes and see result
2. why audio = whisper.load_audio("asoundbit{}.wav".format(i)) is in
the loop ???
На сб, 29.07.2023 г. в 7:43 ч. Irmuun ***@***.***> написа:
… Even setting the language it still has the same issue.
—
Reply to this email directly, view it on GitHub
<#1329 (reply in thread)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAYTSUMLOJFP2SWADFXPQRDXSSIGPANCNFSM6AAAAAAX2IGY2U>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
Beta Was this translation helpful? Give feedback.
1 reply
-
it seem we dont need low level import whisper
model = whisper.load_model("base")
result = model.transcribe("audio.mp3", task="transcribe", prefix="<previous chunk transcription>")
print(result["text"]) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I want to do real-time audio transcription/translation and have tested doing so with 5-10 second chunks. However the quality is much worse doing it in chunks compared to recording the full audio then running it through whisper. Even using the prompt with the previously transcribed text was still noticeably worse.
It seems like using the Prefix would help with this however it does not seem to be working properly or I am using it incorrectly.
I am transcribing the first 5 second chunk then using that transcription as the prefix for the next input while adding the new 5 second chunk to the previous chunk for a 10 second input, and so on until it hits the 30 second limit. However doing that only results in empty output or repetitions of the initial 5 second transcription with nothing new.
Are there any examples on how to use the Prefix option properly?
Beta Was this translation helpful? Give feedback.
All reactions