Multiple training steps on sub batches from single batch #9610
Unanswered
ssharpe42
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
Replies: 1 comment 1 reply
-
Hey @ssakhavi, The dis-advantage from your current implementation is that the entire image will be moved to device, which might lead to OOM or smaller batch_size. Another approach is the following: import torch
from torch.utils.data import IterableDataset
class FileDataset(IterableDataset):
def __init__(self, file_list):
super().__init__()
self.file_list = file_list
self.counter = 0
def __iter__(self):
self.counter = 0
while True:
if self.counter == len(self.file_list):
break
self.counter += 1
data = torch.load(self.file_list[self.counter])
chunks = create_chunks(data)
for chunk in chunks:
yield chunk |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a dataset of very large files which I want to iterate over to read in and then within each file I would like to create sub-batches that will make up the batch used for each
train_step
.Is there any way to efficiently implement this using PL?
Would the only way to do this be to iterate within a train step and do manual optimization?
Beta Was this translation helpful? Give feedback.
All reactions