Replies: 2 comments 2 replies
-
that's normal/default behavior if u want multiple gpu u have to modify source code but since checkpoint fit to 1 gpu there's no advantage to use multiple |
Beta Was this translation helpful? Give feedback.
-
Why does this happen? Model size fits one GPU: Whisper models (like medium) typically fit comfortably on one GPU, so the code doesn't split workload across multiple GPUs automatically. No multi-GPU code: The official Whisper repo and CLI don't implement multi-GPU parallelism (like data parallelism or model parallelism) out of the box. How to utilize multiple GPUs? Modify source code: Implement multi-GPU inference by modifying the PyTorch code to distribute batches across GPUs (DataParallel or DistributedDataParallel). This is non-trivial because Whisper inference usually runs on one input at a time, limiting batch-based parallelism. Run multiple processes: Run multiple independent Whisper inference processes each bound to a different GPU (CUDA_VISIBLE_DEVICES=0 and CUDA_VISIBLE_DEVICES=1), then load-balance your audio files manually or via a script. Use batch processing: If you batch many audio files together, you can try to split batches across GPUs manually. Summary Multi-GPU usage requires custom code or running multiple parallel processes. Since the model fits on a single GPU, multi-GPU inference might not give much speed-up unless you're processing many files concurrently. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Running whisper on the command line in linux, it will only use my first GPU to the max, but not the second GPU at all. nvtop shows both GPUs but only one working (95% with 5.6GB of VRAM used). Both are 3060s with 12GB of RAM and I am running the medium model. I installed whisper with pip. The command line is 'whisper "file" --model medium --language xx'.
Beta Was this translation helpful? Give feedback.
All reactions