You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello everyone, I am trying to make a real-time transcription application based on whisper. The basic implementation idea is to cyclically transcribe the last five seconds of speech (in the specific implementation, some cutting and adjustments will be made based on the intervals between words, which will not be detailed here).
In the strategy I am currently using, in order to ensure the coherence of the transcribed content, I will use all the previously transcribed content as the prompt for the current transcription window. However, I found that as the running time increases, although the length of each audio piece is basically They are all around five seconds, but the transcription speed will get slower and slower. This problem will not occur if I always keep only the most recently transcribed n words as prompts (but this sometimes affects the accuracy of the transcription).
So, I want to know, will the length of prompt affect the transcription speed? How is it affected?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello everyone, I am trying to make a real-time transcription application based on whisper. The basic implementation idea is to cyclically transcribe the last five seconds of speech (in the specific implementation, some cutting and adjustments will be made based on the intervals between words, which will not be detailed here).
In the strategy I am currently using, in order to ensure the coherence of the transcribed content, I will use all the previously transcribed content as the prompt for the current transcription window. However, I found that as the running time increases, although the length of each audio piece is basically They are all around five seconds, but the transcription speed will get slower and slower. This problem will not occur if I always keep only the most recently transcribed n words as prompts (but this sometimes affects the accuracy of the transcription).
So, I want to know, will the length of prompt affect the transcription speed? How is it affected?
Beta Was this translation helpful? Give feedback.
All reactions