Best way to bypass requirement of a DataLoader when inputs are generated stochastically on-demand #8081

jmarshrossney · 2021-06-22T18:35:51Z

jmarshrossney
Jun 22, 2021

Hello. Please forgive the very basic question!

I really like the perks and freebies that come with letting the Lightning Trainer handle training. However, for some applications I do not need or want a dataloader, and so far I've not been able to figure out how to use the Trainer without fooling it by defining a dummy DataLoader that does nothing. 🙄

I would like to know if there is a 'nice' way to use the 'standard' Lightning setup of a LightningModule + Trainer that bypasses any dependence on a data loader.

The context is that model (normalizing flow) inputs are generated on-demand by sampling from some known distribution, and the loss function is the KL divergence with respect to some other distribution whose un-normalised density function is known.

Grateful for any suggestions!

Cheers,
Joe.

Answered by justusschock

Jun 23, 2021

Hi Joe,

You don't need a DataLoader. You just need something we can iterate on, that produces batches.
That being said, some features (like automatically adapting the sampler for distributed training) won't work in that case, but it sounds like you don't need them anyways.

Another possibility would be to use the IterableDataset from PyTorch and wrap that one into a loader for convencience.

Best,
Justus

View full answer

justusschock · 2021-06-23T07:34:03Z

justusschock
Jun 23, 2021
Maintainer

Hi Joe,

You don't need a DataLoader. You just need something we can iterate on, that produces batches.
That being said, some features (like automatically adapting the sampler for distributed training) won't work in that case, but it sounds like you don't need them anyways.

Another possibility would be to use the IterableDataset from PyTorch and wrap that one into a loader for convencience.

Best,
Justus

1 reply

jmarshrossney Jun 23, 2021
Author

I appreciate the clarification, Justus, and thanks for drawing my attention to IterableDataset - that is indeed a convenient solution!

Cheers,
Joe.

ananthsub · 2021-06-24T06:34:27Z

ananthsub
Jun 24, 2021

@justusschock is it worth us updating the typehints on the dataloader APIs to reflect that users only need to provide an iterable? or should we continue to rely on ducktyping?

1 reply

justusschock Jun 24, 2021
Maintainer

@ananthsub We could update this, but we also need to state the limitations for usecases like DDP. I feel when we update the typing, there might be people not reading the docs and complaining that it does not work as expected. THat's why I'm a bit hesitating to do this right away. What do you think @ananthsub ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Best way to bypass requirement of a DataLoader when inputs are generated stochastically on-demand #8081

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Best way to bypass requirement of a DataLoader when inputs are generated stochastically on-demand #8081

Uh oh!

jmarshrossney Jun 22, 2021

Replies: 2 comments · 2 replies

Uh oh!

justusschock Jun 23, 2021 Maintainer

Uh oh!

jmarshrossney Jun 23, 2021 Author

Uh oh!

Uh oh!

ananthsub Jun 24, 2021

Uh oh!

justusschock Jun 24, 2021 Maintainer

jmarshrossney
Jun 22, 2021

Replies: 2 comments 2 replies

justusschock
Jun 23, 2021
Maintainer

jmarshrossney Jun 23, 2021
Author

ananthsub
Jun 24, 2021

justusschock Jun 24, 2021
Maintainer