How to shuffle training data in every epoch? #7332
-
How could someone shuffle the training dataloader (using Datamodule) on each epoch? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 4 replies
-
I think it is as simple as setting the As shown in the code snippet below: class MyDataModule(pl.LightningDataModule):
def __init__(self, params):
super().__init__()
self.params = params
def setup(self, stage=None):
# Assign train/val datasets for use in dataloaders
if stage == 'fit' or stage is None:
self.train_dataset = CodeSearchDataset(
data_path=self.params.dir + "train.jsonl")
def train_dataloader(self):
return DataLoader(
self.train_dataset,
batch_size=self.params.batch_size,
shuffle=True,
drop_last=True,
num_workers=self.params.num_workers
) |
Beta Was this translation helpful? Give feedback.
-
You can set |
Beta Was this translation helpful? Give feedback.
-
It seems to be the case that the default behavior is data is shuffled only once at the beginning of the training. |
Beta Was this translation helpful? Give feedback.
You can set
Trainer(reload_dataloaders_every_epoch=True)
and if you have alsoshuffle=True
in your dataloader, it will do that by creating a new dataloader every epoch.That's my understanding.