DDP use/access entire effective batch in callback #12076
Unanswered
gustavhartz
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I'm training a model with 1 node and 2-4 GPU using the DDP setting. My goal is to log something once using a callback function, where I have access to the entire effective batch across all GPUs. What is a good way of doing this? If there is one.
I have looked at #9259 and #6501, but can't get It to work in my setting, as all_gather does not work from the Callback only from the
pl.LightningModule
.Even if that worked, it would only give me the correct size of outputs - not of batch and batch_idx...
I have tried using the
trainer.is_global_zero:
in the callback to only log the values once. But this only gives me 1⁄gpu of the total effective batch.I was thinking of something along the lines of the code below
to link the device in the training_step_end to the callback, by combining it with the
trainer.is_global_zero:
in the callback, but it faces the same issues as before with dimensions of the remaining callback elements.Hope someone can help :)
Beta Was this translation helpful? Give feedback.
All reactions