Skip to content

Allow shuffling when overfit_batches is active #9850

@low5545

Description

@low5545

Proposed refactoring or deprecation

Instead of disabling shuffle / replacing RandomSampler with SequentialSampler in the train dataloader, replace the train dataset with a fixed subset of it using torch.utils.data.Subset (eg. first N samples of the dataset, where N is given by overfit_batches. This gives the same dataset samples as with the previous implementation.)

Motivation

This prevents training batches to be the same for every epoch

Pitch

Added on 12 Oct 2021:
The current implementation for overfit_batches disables shuffling by replacing RandomSampler with SequentialSampler in the train dataloader, in order to restrict the training / overfit to the first N samples of the train dataset for every epoch. However, this gives the same sequence of batches & non-unique batches across epochs, which is undesirable.

We should instead allow shuffling within the N samples across epochs, according to the shuffle option of the train dataloader, in order to give a different sequence of batches across epochs & mostly unique batches throughout the training process.


If you enjoy Lightning, check out our other projects! ⚡

  • Metrics: Machine learning metrics for distributed, scalable PyTorch applications.

  • Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning

  • Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch

  • Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.

cc @Borda @justusschock @awaelchli @akihironitta @rohitgr7

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureIs an improvement or enhancementhelp wantedOpen to be worked onrefactor

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions