Multiple training steps on sub batches from single batch #9610

ssharpe42 · 2021-09-20T15:10:51Z

ssharpe42
Sep 20, 2021

I have a dataset of very large files which I want to iterate over to read in and then within each file I would like to create sub-batches that will make up the batch used for each train_step.

Is there any way to efficiently implement this using PL?

class FileDataset(Dataset):
    def __init__(self, file_list):
        super().__init__()
        self.file_list = file_list

    def __getitem__(self, index):
        return torch.load(self.file_list[index])

    def __len__(self):
        return len(self.file_list)

for file in DataLoader(FileDataset(["a.pt","b.pt","c.pt"]), batch_size = 1):
    for batch in batch_function(file, sub_batch_size):
        # do train step

Would the only way to do this be to iterate within a train step and do manual optimization?

def __init__(self):
    self.automatic_optimization = False


def training_step(self, batch, batch_idx):
    opt = self.optimizers(use_pl_optimizer=True)
    for sub_batch in batch_function(batch, sub_batch_size):
        
        loss = ...
        opt.zero_grad()
        self.manual_backward(loss)
        opt.step()

tchaton · 2021-09-21T07:55:48Z

tchaton
Sep 21, 2021
Maintainer

Hey @ssakhavi,

The dis-advantage from your current implementation is that the entire image will be moved to device, which might lead to OOM or smaller batch_size.

Another approach is the following:

import torch
from torch.utils.data import IterableDataset

class FileDataset(IterableDataset):
    def __init__(self, file_list):
        super().__init__()
        self.file_list = file_list
        self.counter = 0

    def __iter__(self):
        self.counter = 0
        while True:
            if self.counter == len(self.file_list):
                break
            self.counter += 1
            data = torch.load(self.file_list[self.counter])
            chunks = create_chunks(data)
            for chunk in chunks:
                yield chunk

1 reply

ssharpe42 Sep 21, 2021
Author

And I would just have to return a 1 batch size dataloader for the LightningDataModule?

DataLoader(FileDataset(...), batch_size= 1)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Multiple training steps on sub batches from single batch #9610

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Multiple training steps on sub batches from single batch #9610

Uh oh!

Uh oh!

ssharpe42 Sep 20, 2021

Replies: 1 comment · 1 reply

Uh oh!

tchaton Sep 21, 2021 Maintainer

Uh oh!

ssharpe42 Sep 21, 2021 Author

ssharpe42
Sep 20, 2021

Replies: 1 comment 1 reply

tchaton
Sep 21, 2021
Maintainer

ssharpe42 Sep 21, 2021
Author