DDP GPU utilization problem #10670
Unanswered
dayeux
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment 4 replies
-
Dear @dragondx, Any chance you could provide a reproducible code snippet with mocked data? Best, |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Training MLP using DDP with 2gpus. Pretty standard code.
We observe an effect where the training speed slows down over time. At initial iterations, we get 1s/it. Both GPUs gets 100% utilization with occasional low utilization presumably it is doing gathering/backprop ops. As training goes on for a few thousands iterations, we get 2s/it. One gpu get 100% utilization consistently (the master?), the other gpu waits a long time with low utilization (1-2%) before getting short burst of 100% utilization. We also noticed that CPU utilization (80% along all cores) tends to be much higher at earlier iterations. After a few thousand iterations, CPU utilization is low (20%, with occasional 90% in some cores). Is this some sort of timing problem for dataloading? There is no preprocessing for CPU other than reading data from disk.
We use a custom dataloader for our use case, since we have a lot of data. Using numpy memmap to get the datapoint.
Dataloder params:
torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, persistent_workers=True, pin_memory=True, num_workers=16, prefetch_factor=128)
We wonder what is causing this. Any help will be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions