This repository was archived by the owner on Oct 9, 2023. It is now read-only.
Multi gpu training fails using strategy='ddp' #1201
dudeperf3ct
started this conversation in
General
Replies: 2 comments 8 replies
-
@rohitgr7 I was suggested to ask the question in lightning forum as the issue comes from PL library instead of Flash.
Originally posted by @ethanwharris in #1188 (comment) |
Beta Was this translation helpful? Give feedback.
5 replies
-
what's the output for |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Running video classification using flash
Linked issue: #1188
Code Sample
I ran
python -c 'import torch; print(torch.cuda.is_available())'
and pytorch is able to detect all gpus and cuda is available at start of the script. Only when i run using flash trainer usingddp
strategy, I get the above error.The script runs fine when using 1 gpu.
Configurations:
I am running pytorch lightning ngc with
--gpus all
and--shm-size=1g
flagspytorch/lightning/flash : 1.9.0a0/1.5.10/0.8.0dev
I have 8x V100 with driver version 418.67 and cuda version 10.1.
Beta Was this translation helpful? Give feedback.
All reactions