Using callbacks for datamodule setup preprocessing logic? #8650
Answered
by
tchaton
brijow
asked this question in
Lightning Trainer API: Trainer, LightningModule, LightningDataModule
-
I have some preprocessing logic to run during datamodule setup process, but only in certain situations (mostly to try out different preprocessing steps while experimenting, but at other times, its due to the model I am using with the datamodule). Is there a way to specify a set of data preprocessing steps to perform using callbacks? Reading the documentation, I could not find the correct hook to use. |
Beta Was this translation helpful? Give feedback.
Answered by
tchaton
Aug 3, 2021
Replies: 1 comment 2 replies
-
Dear @brijow, I wondered if something like that could fit your use-case ? class MyDataModule(LightningDataModule):
def __init__(self):
self._processed_train_dataset = None
def setup(self):
self.train_dataset = ...
@property
def processed_train_dataset(self):
return self._processed_train_dataset or self.train_dataset
@processed_train_dataset.setter
def processed_train_dataset(self, processed_train_dataset):
self._processed_train_dataset = processed_train_dataset
def train_dataloader(self):
return DataLoader(self.processed_train_dataset)
class Preprocessing1(Callback):
def preprocess_function(self, dataset):
# your preprocessing logic
def on_train_start(self, trainer, pl_module):
# apply processing
trainer.datamodule.processed_train_dataset = self.preprocess_function(trainer.datamodule.train_dataset)
# force dataloader reload
# trainer.reset_train_dataloader(pl_module)
trainer = Trainer(callbacks=Preprocessing1())
trainer.fit(model, dm) |
Beta Was this translation helpful? Give feedback.
2 replies
Answer selected by
brijow
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Dear @brijow,
I wondered if something like that could fit your use-case ?