Best way to bypass requirement of a DataLoader when inputs are generated stochastically on-demand #8081
-
Hello. Please forgive the very basic question! I really like the perks and freebies that come with letting the Lightning Trainer handle training. However, for some applications I do not need or want a dataloader, and so far I've not been able to figure out how to use the Trainer without fooling it by defining a dummy DataLoader that does nothing. 🙄 I would like to know if there is a 'nice' way to use the 'standard' Lightning setup of a LightningModule + Trainer that bypasses any dependence on a data loader. The context is that model (normalizing flow) inputs are generated on-demand by sampling from some known distribution, and the loss function is the KL divergence with respect to some other distribution whose un-normalised density function is known. Grateful for any suggestions! Cheers, |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hi Joe, You don't need a DataLoader. You just need something we can iterate on, that produces batches. Another possibility would be to use the Best, |
Beta Was this translation helpful? Give feedback.
-
@justusschock is it worth us updating the typehints on the dataloader APIs to reflect that users only need to provide an iterable? or should we continue to rely on ducktyping? |
Beta Was this translation helpful? Give feedback.
Hi Joe,
You don't need a DataLoader. You just need something we can iterate on, that produces batches.
That being said, some features (like automatically adapting the sampler for distributed training) won't work in that case, but it sounds like you don't need them anyways.
Another possibility would be to use the
IterableDataset
from PyTorch and wrap that one into a loader for convencience.Best,
Justus