Understanding the replicated DataLoaders in DDP #9251
Unanswered
jatentaki
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have two questions regarding the behavior of
DataLoader
s when multi-gpu training withddp
/ddp_spawn
. Let me first define that I use "GPU worker" to mean the process using each of the N GPUs for model forward/backward and "data worker" to mean the processes created bytorch.data.utils.DataLoader
to load and preprocess batches.ddp
the dataset is being recreated for each GPU worker but the total number of data workers seems to be constant: does each GPU worker get its share of N_total_data_workers / N_gpu_workers? Is this documented somewhere?Beta Was this translation helpful? Give feedback.
All reactions