Skip to content
Discussion options

You must be logged in to vote

The program converts your input with ffmpeg (effectively ffmpeg -i <recording> -ar 16000 -ac 1 -c:a pcm_s16le <output>.wav) and pre-processes it before doing any speech recognition. You can just give it your video files, except when that command wouldn't work (like if you have multiple audio languages and don't want the default track).

It is just too slow. I have [...] cpu

There's your problem, this is a colossal language model... even on a RTX 3090 (high-end consumer) GPU, the medium model is only 3x-4x faster than playback time. On my CPU (AMD Ryzen 7 3700X), the small model did not finish transcribing a 1-minute sample in over 10 minutes.

Replies: 3 comments 4 replies

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
1 reply
@FurkanGozukara
Comment options

Answer selected by jongwook
Comment options

You must be logged in to vote
3 replies
@FurkanGozukara
Comment options

@georpat
Comment options

@stromajer
Comment options

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
4 participants