Skip to content

[BUG] Data leakage in TorchExperiment #212

@SimonBlanke

Description

@SimonBlanke

The TorchExperiment reuses the same LightningDataModule instance across all optimization trials. If the datamodule maintains internal state (e.g., data augmentation settings, random state, preprocessing cache), this state persists across trials.
In this line the datamodule is the same for each iteration. So the lightning_module is instantiated fresh for each trial, but datamodule is reused. Here is an example of a datamodule, that would react in an undesired way:

class ImageDataModule(L.LightningDataModule):
    def __init__(self):
        super().__init__()
        self.augmentation_strength = 0.5  # Internal state

    def train_dataloader(self):
        # Augmentation strength might be modified during training
        transforms = self._get_transforms(self.augmentation_strength)
        return DataLoader(self.train_dataset, ...)

    def on_train_epoch_end(self):
        # Some datamods adjust augmentation over time
        self.augmentation_strength *= 1.1 

Metadata

Metadata

Labels

bugSomething isn't workingmodule:integrationsIntegrations for applying optimization to other libraries

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions