LightningDatamodule.prepare_data() & setup() outside of Trainer #6199
Unanswered
naraugialusru
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Am I right in assuming that the LightningDatamodule methods prepare_data() and setup() should not create torch.Tensors (or more generally, any action that would assign data to a device)?
For example, if I wanted to apply a transform that included a torch.as_tensor() call, then this should happen in the train/val/test_dataloader() methods, not in the prepare_data() or setup() methods, correct?
I ask this because my understanding is that the code doesn’t (shouldn’t) know about the engineering/hardware unless it is run by a Trainer. Yet on the LightningDataModule doc page, it is suggested that “when information about the dataset is needed to build the model”, prepare_data() and setup() can be called outside the Trainer:
Finally, if this is right, is there a rule of thumb to make sure I don’t accidentally call a method that uses engineering/hardware knowledge behind the scenes?
P.S. The docs state
prepare_data is called from a single GPU. Do not use it to assign state (self.x = y)
and
setup is called from every GPU. Setting state here is okay
but these comments weren’t enough for me to feel confident in what is/isn’t allowed.
Beta Was this translation helpful? Give feedback.
All reactions