RuntimeError: Expected all tensors to be on the same device
#13667
-
I am working on my first Lightning project and having an issue when I attempt to train on GPUs. When I train on the CPU using the accelerator='cpu' argument, the training and validation occurs with no problem. My workstation has two GPUs, so I set the accelerator='gpu', devices=2, and strategy='dp' (I've also tried 'ddp' with the same result). The data is being provided by a LightningDataModule that is pulling a custom Torch Dataset, and the Dataset is using a Pandas Dataframe. The Dataframe contains file names to Numpy files which are being loaded as follows:
However, when I try to switch to GPUs for training, I receive the following error with strategy='dp': With the strategy='ddp' this error is displayed: Any help would be appreciated. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
I figured it out. I had a mistake in my model generation where I was using a List for modules rather than a Pytorch class. I changed: |
Beta Was this translation helpful? Give feedback.
I figured it out. I had a mistake in my model generation where I was using a List for modules rather than a Pytorch class.
I changed:
self.fc = []
to
self.fc = nn.Sequential()