Replies: 4 comments 1 reply
-
That is to be expected. CPU is very slow, particularly on the larger models. Something like an NVIDIA GTX 1660 would be ~10x the speed of a 6/8 core CPU. I've found both base and small models to be very accurate (for English), with most differences between models only being punctuation (tiny model does tend to have more errors though still not that many). On your CPU the base model may be close to 1x. On my Ryzen 4500U on the small.en model transcription is about 0.10x realtime. So if the medium model is 0.05x realtime on your Ryzen 3600 that sounds about right. You can also try the openvino version: https://github.com/zhuzilin/whisper-openvino. On the base.en model it's about twice as fast for me. Another option, depending on how much you have to transcribe and any data security concerns is to run whisper within a free Google Colab GPU instance, which ran at about 8x realtime for me on small.en model. There is another project: https://github.com/ggerganov/whisper.cpp which is much much faster on CPU. I assume it is trading speed for accuracy but I don't understand how/where accuracy is reduced. |
Beta Was this translation helpful? Give feedback.
-
RTX 3060 is 22 times faster than my 16 cores 10700f @ 4.5 GHZ think yourself :) |
Beta Was this translation helpful? Give feedback.
-
No changes made to Whisper and we have great acceleration on CPUs Using 011 of 16CPUs for the "tiny.en" model, a transcription speed of 32.713x Machine -- MacBook, macOS Big Sur using 2.3 GHz Intel Core i9, 16 cores, with 16G of RAM. |
Beta Was this translation helpful? Give feedback.
-
I tried Whisper on a Jetson tx2 development kit (ubuntu 18.04, Nvidia Jetpack 4.6.3, python 3.8, PyTorch 2.0, CUDA 10.2) with a 2 sec mp3 audio and it took 12 sec to transcribe with --language en --model tiny parameters. I was not able to use CUDA due to compatibility issues with pytorch 2.0. I tried installing pytorch 1.10 Nvidia wheel with python 3.8 but it was unsuccessful. Do you know somebody who could utilise CUDA on this device? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I know that transcribing on GPU will be much faster than on CPU, but still, I have a server with Ryzen 5 3600, and transcribing times are abysmally long.
I am using Polish language, with medium model, and for file with 31s duration it takes 11 minutes and 16 seconds, while CPU is being maxed on 6 threads.
Also, every time I am getting a warning that FP16 is not supported on CPU; using FP32 instead, could it be a cause of this slowness?
Beta Was this translation helpful? Give feedback.
All reactions