Exception: process 0 terminated with exit code 1 when DDP #151
Replies: 4 comments 3 replies
-
@wdy06 distributed data parallel inits a process for each GPU. but there's no guarantee what happens with docker. you probably have to configure docker so it allows nb_gpus processes |
Beta Was this translation helpful? Give feedback.
-
@williamFalcon Thank you for your response! I will try various things with that. |
Beta Was this translation helpful? Give feedback.
-
@wdy06 I too am facing the same issue with DDP in a JupyterHub+Docker environment; But without using the PyTorch Lighning package. My DDP script seems to run smoothly in the docker container and only in the Notebook environment, it throws the error. Let me know if you have found any solution to this problem. Thanks! |
Beta Was this translation helpful? Give feedback.
-
Try this: if __name__ == '__main__':
trainer.fit(model) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
What is your question?
I've got error below when runnning
trainer.fit(model)
with DDPHow can i fix this ?
Code
Please paste a code snippet if your question requires it!
What have you tried?
i've tried running script above on terminal of jupyterlab on Docker.
When
distributed_backend='dp'
, it works well.What's your environment?
conda 4.5.11
1.2.0
0.4.6
0.6.9
Beta Was this translation helpful? Give feedback.
All reactions