[Multi-GPUs] Issue with Significant Discrepancy in Task Completion Time Between GPUs #2238

CatherineXX · 2024-06-22T03:41:53Z

CatherineXX
Jun 22, 2024

Operating System: Dell Precision T7960 Windows server2022
Memory: 32G
GPUs: 2 x NVIDIA A5500
Python Version: 3.11.9
Command:
- whisper 0.mp4 --language English --model medium.en --device cuda:0
- whisper 1.mp4 --language English --model medium.en --device cuda:1
Issue Description: I am experiencing a significant performance discrepancy while running Whisper tasks on my Dell Precision T7960 with two NVIDIA A5500 GPUs. Despite assigning different video files of the same length to each GPU, the task on cuda:1 consistently takes twice as long to complete as the task on cuda:0.
Expected Behavior: Both GPUs should exhibit similar performance characteristics, resulting in comparable task completion times.
Actual Behavior: The GPU assigned as cuda:1 takes approximately twice the time to complete the task compared to cuda:0. Additionally, even when processing identical tasks individually, cuda:0 consistently outperforms cuda:1. However, when disabling one of the GPUs in the device manager and running the task, the performance is on par with cuda:0.
*Additional Context:
The temperature and power usage between the GPUs show significant differences, with cuda:0 running hotter and consuming more power.
Could you please assist in identifying and resolving this performance discrepancy to ensure both GPUs work at their optimal efficiency?

Additionally, I ran whisper_cpp, and the performance of the two GPUs was excellent, maximizing efficiency. Therefore, I believe the issue lies with Whisper's model optimization, resulting in one GPU performing well while the other does not.

phineas-pta · 2024-06-23T23:55:12Z

phineas-pta
Jun 23, 2024

maybe python GIL is the cause 🤔 try linux to see if the problem persist

3 replies

CatherineXX Jun 24, 2024
Author

I have run Whisper on my Ubuntu system and used Docker to run Whisper. The final results are the same as before.

phineas-pta Jun 27, 2024

did u try using python code instead of command line ?

CatherineXX Jun 27, 2024
Author

Yes, it was the same result.

jake1271 · 2024-06-27T14:54:21Z

jake1271
Jun 27, 2024

Yes, I've run into this problem as well.
I cannot figure it out. It doesn't matter if I have hyper threaded 4 cores or 8 cores.

I even set the pcie lanes to 1x for all slots, I get the same issue, first gpu runs at full speed then massive degradation in 2nd and 3rd gpu performance. Just to really confirm this I set all gpus on 1x pcie risers, it made no difference, same behavior.

It seems like there is some power limit throttling going on, where the subsequent gpus can't ramp up and just limp along.

I also suspect this has something to do with how Whisper is interfacing with the hardware and/or Pytorch? I can run multiple Stable diffusion instances on multiple gpus with no issue on the same machine, behaves normally, only with Whisper does it do this.

This is very frustrating, I hope the team can take a look at this :)

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Multi-GPUs] Issue with Significant Discrepancy in Task Completion Time Between GPUs #2238

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[Multi-GPUs] Issue with Significant Discrepancy in Task Completion Time Between GPUs #2238

Uh oh!

Uh oh!

CatherineXX Jun 22, 2024

Replies: 2 comments · 3 replies

Uh oh!

phineas-pta Jun 23, 2024

Uh oh!

CatherineXX Jun 24, 2024 Author

Uh oh!

phineas-pta Jun 27, 2024

Uh oh!

CatherineXX Jun 27, 2024 Author

Uh oh!

jake1271 Jun 27, 2024

CatherineXX
Jun 22, 2024

Replies: 2 comments 3 replies

phineas-pta
Jun 23, 2024

CatherineXX Jun 24, 2024
Author

CatherineXX Jun 27, 2024
Author

jake1271
Jun 27, 2024