Does it make sense to run OpenAI whisper multiple times and get an LLM to generate the "best" transcription? #2566

sgnoob · 2025-03-31T00:15:58Z

sgnoob
Mar 31, 2025

OpenAI Whisper is pretty good, but there is always a little bit of transcription errors.

I'm thinking of doing the following:

On the same audio file, run OpenAI whisper X times.
Send the transcripts to a LLM and tell it to use all X subtitles to produce a refined version.

Does the idea make sense?

Problems that I can see:

Does the LLM even know how to produce the "best" version...
LLM output length is usually limited and it will be very tedious to get the full transcription back.

LLM will probably be Google Gemini because it's free, even for calling via API.
X will probably be 4. I'm probably going to run on Kaggle free tier, with 2x GPUs. I should be able to run 2 transcriptions in parallel at a time.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Does it make sense to run OpenAI whisper multiple times and get an LLM to generate the "best" transcription? #2566

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Does it make sense to run OpenAI whisper multiple times and get an LLM to generate the "best" transcription? #2566

Uh oh!

sgnoob Mar 31, 2025

Replies: 0 comments

sgnoob
Mar 31, 2025