Why my gpu-util is low? #6962
Unanswered
yllgl
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I use one node and 4 gpus for training. And I use dali dataloader, I don't know why my gpu util is low, and training is also slow. About 1:30 per epoch, I train for 200 epoches, which will cost 5 hours. It's slower than the project mmclassification, which only cost 3.5 hours. Compared to mmclassification project which can only support torch.utils.data.dataloader, I think if I use dali_dataloader, it will accelerate my training. But as you can see, it's the opposite. I don't know why. Could anyone give me some advice? I use cifar10 dataset. And I train on slurm.

Here is my code.
main.py
net.py
dataloader.py
Beta Was this translation helpful? Give feedback.
All reactions