Replies: 1 comment 1 reply
-
Hi @YerePhy , the error message is logged from |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I'm training a
monai.networks.nets.DynUNet
with deep supervision in a SLURM cluster, my dataset is a big folder of files. After 5 epochs my training crash due to an error realated to shared memory. In short, the errors I get are the following:RuntimeError: unable to open shared memory object </torch_3368351_1991753163> in read-write mode
RuntimeError: Caught RuntimeError in DataLoader worker process 0.
Until here, this question might be better placed in the PyTorch forum. However, I'm getting a lot of messages like this one:
dev_collate - CRITICAL - >>> collate/stack a list of x
, where x=numpy arrays, list of tensors, "xyzt_units", "cal_max" out of 42 keys, etc.
I think that this is related to MONAI. You can see the whole error trace here
Has someone faced something similar?
Beta Was this translation helpful? Give feedback.
All reactions