You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
OpenAI Whisper is pretty good, but there is always a little bit of transcription errors.
I'm thinking of doing the following:
On the same audio file, run OpenAI whisper X times.
Send the transcripts to a LLM and tell it to use all X subtitles to produce a refined version.
Does the idea make sense?
Problems that I can see:
Does the LLM even know how to produce the "best" version...
LLM output length is usually limited and it will be very tedious to get the full transcription back.
LLM will probably be Google Gemini because it's free, even for calling via API.
X will probably be 4. I'm probably going to run on Kaggle free tier, with 2x GPUs. I should be able to run 2 transcriptions in parallel at a time.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
OpenAI Whisper is pretty good, but there is always a little bit of transcription errors.
I'm thinking of doing the following:
Does the idea make sense?
Problems that I can see:
LLM will probably be Google Gemini because it's free, even for calling via API.
X will probably be 4. I'm probably going to run on Kaggle free tier, with 2x GPUs. I should be able to run 2 transcriptions in parallel at a time.
Beta Was this translation helpful? Give feedback.
All reactions