Linux command line does not use all GPUs #2626

8192K · 2025-07-17T12:17:46Z

8192K
Jul 17, 2025

Running whisper on the command line in linux, it will only use my first GPU to the max, but not the second GPU at all. nvtop shows both GPUs but only one working (95% with 5.6GB of VRAM used). Both are 3060s with 12GB of RAM and I am running the medium model. I installed whisper with pip. The command line is 'whisper "file" --model medium --language xx'.

phineas-pta · 2025-07-23T21:09:53Z

phineas-pta
Jul 23, 2025

that's normal/default behavior

if u want multiple gpu u have to modify source code

but since checkpoint fit to 1 gpu there's no advantage to use multiple

0 replies

SametDulger · 2025-08-03T02:07:05Z

SametDulger
Aug 3, 2025

Why does this happen?
Single-GPU default: The Whisper CLI, installed via pip, runs inference on a single GPU by default. It does not include built-in multi-GPU support.

Model size fits one GPU: Whisper models (like medium) typically fit comfortably on one GPU, so the code doesn't split workload across multiple GPUs automatically.

No multi-GPU code: The official Whisper repo and CLI don't implement multi-GPU parallelism (like data parallelism or model parallelism) out of the box.

How to utilize multiple GPUs?
To use both GPUs, you'd need to:

Modify source code:

Implement multi-GPU inference by modifying the PyTorch code to distribute batches across GPUs (DataParallel or DistributedDataParallel).

This is non-trivial because Whisper inference usually runs on one input at a time, limiting batch-based parallelism.

Run multiple processes:

Run multiple independent Whisper inference processes each bound to a different GPU (CUDA_VISIBLE_DEVICES=0 and CUDA_VISIBLE_DEVICES=1), then load-balance your audio files manually or via a script.

Use batch processing:

If you batch many audio files together, you can try to split batches across GPUs manually.

Summary
Default Whisper CLI uses only one GPU.

Multi-GPU usage requires custom code or running multiple parallel processes.

Since the model fits on a single GPU, multi-GPU inference might not give much speed-up unless you're processing many files concurrently.

2 replies

8192K Aug 3, 2025
Author

Run multiple independent Whisper inference processes each bound to a different GPU (CUDA_VISIBLE_DEVICES=0 and CUDA_VISIBLE_DEVICES=1), then load-balance your audio files manually or via a script.

That's what I ended up doing. I had folder full of video files that could easily have been processed in parallel.

phineas-pta Aug 3, 2025

in that case your should write a python script to load 1 whisper process for each gpu, then for each process do batch inference

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Linux command line does not use all GPUs #2626

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Linux command line does not use all GPUs #2626

Uh oh!

8192K Jul 17, 2025

Replies: 2 comments · 2 replies

Uh oh!

phineas-pta Jul 23, 2025

Uh oh!

SametDulger Aug 3, 2025

Uh oh!

8192K Aug 3, 2025 Author

Uh oh!

phineas-pta Aug 3, 2025

8192K
Jul 17, 2025

Replies: 2 comments 2 replies

phineas-pta
Jul 23, 2025

SametDulger
Aug 3, 2025

8192K Aug 3, 2025
Author