one machine, two gpus. #16295
Unanswered
WindSmileValley
asked this question in
DDP / multi-GPU / multi-node
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
some strange questions, 2 cases to illustrate:
case1:
sei_trainer = Trainer(
gpus=2,
strategy='dp',)
self.model = torch.nn.DataParallel(self.model,device_ids = [0,1]) # it is in a LightningModule class
report mistake: "RuntimeError: Expected tensor for argument #1 'input' to have the same device as tensor for argument #2 'weight'; but device 1 does not equal 0 (while checking arguments for cudnn_convolution)"
however,
if
case1:
sei_trainer = Trainer(
gpus=1,
strategy='dp',)
self.model = torch.nn.DataParallel(self.model,device_ids = [0,1])
then, the code run in 2 gpus.
So, what is happen and how I to fix it?
Thank you all.
Beta Was this translation helpful? Give feedback.
All reactions