Loading large datafiles #11613
Unanswered
labfiscog1
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I have a large training dataset ~100GB (4 million samples)... I've split it into smaller ones, e.g. 20GB (800k samples). How can I use the DataLoader for this? In the examples, I've seen usually loading the samples per instance/item (e.g file path to images), however, this process of accessing the disk for every item might make the data loader very slow.
I would like to load only the file batches (20GB each) and once used proceed to the next one...
Any suggestions?
Thanks in advance!
Beta Was this translation helpful? Give feedback.
All reactions