validation_step causes deadlock? #17344
Replies: 1 comment
-
Now I understand what happens. For some reason, DDP doesn't like unstructured data and it only accepts tensor / dict of tensors / list of tensors or so. Also, just mention here, The conclusion is we need to avoid unstructured data. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
my model requires a little bit "unconstructed evaluation", which means that the model's return is a complex dict where some values are non-tensors such as strings. The
evaluation
function will turn tensor to numpy array and so some computing in CPU.After introducing this code in my PTL model, the multi-gpu training will hang after the validation but before the first training step.
Does anyone has any idea on this? Thank!!!!
Beta Was this translation helpful? Give feedback.
All reactions