Iterating over task for Continual Learning. #11724

prateeky2806 · 2022-02-03T17:06:53Z

prateeky2806
Feb 3, 2022

Hi everyone, I am new to PyTorch lightening and I am currently trying to implement a continual learning model in PyTorch lightening.

I have multiple data loaders for different tasks and I want to train on all of these data loaders. After training on task1 with dataloader1 I want to update the parameters of the model which are going to be trained for task two. To do this, I have an attribute named current_task in my dataloader which decides the dataset from which the samples are generated for the current task. My datamodule looks something like this.

class RandSplitCIFAR100DataModule(LightningDataModule):
    def __init__(self):
        .....

    def setup(self, stage: Optional[str] = None):
    
        # load datasets only if they're not loaded already
        if not self.data_train and not self.data_val and not self.data_test:
            self.data_train = datasets.CIFAR100(self.hparams.data_dir, train=True, transform=self.train_transforms)
            self.data_val = datasets.CIFAR100(self.hparams.data_dir, train=False, transform=self.val_transforms)
        
        np.random.seed(self.hparams.seed)
        perm = np.random.permutation(self.num_classes)
        print(perm)

        splits = [
            (self.partition_datasetv4(self.data_train, perm[5 * i:5 * (i+1)]),
            self.partition_datasetv4(self.data_val, perm[5 * i:5 * (i+1)]),)
            for i in range(self.hparams.num_tasks)
        ]

        kwargs = {"num_workers": self.hparams.workers, "pin_memory": self.hparams.pin_memory}
        self.loaders = [
            (DataLoader(x[0], batch_size=self.hparams.batch_size, shuffle=True, **kwargs),
            DataLoader(x[1], batch_size=self.hparams.test_batch_size, shuffle=False, **kwargs),)
            for x in splits
        ]

    def update_task(self, i):
        self.current_task = i
        
    def train_dataloader(self):
        return self.loader[self.current_task][0]

    def val_dataloader(self):
        return self.loader[self.current_task][1]

Now I want to have a training loop that does something like this.

for task in range(num_tasks):
    self.dataloder.update_task(task)

    for n, p in model.named_parameters():
        # change parameters to update
    for epoch in range(max_epochs):
        for batch in dataloader:
            ....

I am currently not able to figure out how to go about this, I feel confident that lightening should be able to handle such cases but I am just not sure how to go about this.

Any help is greatly appreciated!
Prateek

Answered by rohitgr7

Feb 3, 2022

well, there are multiple ways:

if your max_epochs is consistent across all the tasks:

class LitModel(LightningModule):
    def on_train_epoch_start(self):
        if current_epoch == 0 or (current_epoch + 1) % self.trainer.reload_dataloaders_every_n_epochs == 0:
            # update model parameters


max_epochs_n_tasks = max_epochs * n_tasks
trainer = Trainer(max_epochs=max_epochs_n_tasks, reload_dataloaders_every_n_epochs=max_epochs)
model = LitModel()

# inject the update task counter logic inside datamodule
dm = RandSplitCIFAR100DataModule(...)
trainer.fit(model, datamodule=dm)

create an explicit loop

def init_trainer(...):
    trainer = Trainer(max_epochs=max_epochs, ...)
    return

View full answer

rohitgr7 · 2022-02-03T18:01:11Z

rohitgr7
Feb 3, 2022

well, there are multiple ways:

if your max_epochs is consistent across all the tasks:

class LitModel(LightningModule):
    def on_train_epoch_start(self):
        if current_epoch == 0 or (current_epoch + 1) % self.trainer.reload_dataloaders_every_n_epochs == 0:
            # update model parameters


max_epochs_n_tasks = max_epochs * n_tasks
trainer = Trainer(max_epochs=max_epochs_n_tasks, reload_dataloaders_every_n_epochs=max_epochs)
model = LitModel()

# inject the update task counter logic inside datamodule
dm = RandSplitCIFAR100DataModule(...)
trainer.fit(model, datamodule=dm)

create an explicit loop

def init_trainer(...):
    trainer = Trainer(max_epochs=max_epochs, ...)
    return trainer
    
datamodule = ...
model = ...
for task in range(num_tasks):
    # update params
    datamodule.update_task(task)
    trainer = init_trainer(...)
    trainer.fit(model, datamodule=dm)

Although I'd suggest (1), even if your max_epochs differs for each task, it can easily be extended to support that too.

4 replies

prateeky2806 Feb 7, 2022
Author

Thanks @rohitgr7, I tried this for my case and I have some progress. I am able to most of the things using the method (1) you provided. Although I am not sure how should call the update tasks method from inside the datamodule. Ideally I would like to do something like

def on_train_epoch_start(self):
        if self.trainer.current_epoch == 0 or self.trainer.current_epoch % self.trainer.reload_dataloaders_every_n_epochs == 0:
            self.task = self.trainer.current_epoch / self.trainer.reload_dataloaders_every_n_epochs
            print(f"Training for task {self.task}")
            self.set_model_task(self, self.task)
            #call to Datamodule's method, not sure how to access the data module inside the model class. 
            self.dataloader.update_task(...)

But I am not sure how to call the datamodule method from inside the model class. Is there a way this can be done?

Also, I was wondering what's the downside of using method (2) which you mentioned above? As that seems more intuitive to me.

rohitgr7 Feb 7, 2022

something like:

class LitDM(LightningDataModule):
    def __init__(...):
        self.current_task = -1

    def train_dataloader(self):
        self.current_task += 1
        return self.loader[self.current_task][0]

    def val_dataloader(self):
        return self.loader[self.current_task][1]

so whenever a request is made to the dataloader, the current_task will increment.

or if you want to be explicit, you can do it after the training of the previous epoch ends:

def on_train_epoch_end(self):
    if self.trainer.current_epoch + 1 % self.trainer.reload_dataloaders_every_n_epochs:
        # you can update model here too
        # and update the task here so that in the next epoch when the dataloaders are loaded, they will be loaded for the new task
        new_task_ix = ...
        self.trainer.datamodule.update_task(new_task_ix)

Also, I was wondering what's the downside of using method (2) which you mentioned above? As that seems more intuitive to me.

you can use that, but for eg. in the case of logging/checkpointing, they will be configured independently. It's like they will be running as almost like independent experiments with a shared model. You can use that if you want.

prateeky2806 Feb 7, 2022
Author

Thanks @rohitgr7, I got the model working but I am just trying to understand the how these things work internally and the best practices.

Once the model is trained I want to iterate over all the tasks and run a testing loop which tests the models performance on all of these tasks. I am currently doing something very hacky.

@torch.no_grad()
def evaluate(model, val_loader, epoch):
    model.eval()
    num_correct = 0
    total_seen = 0
    for batch in val_loader:
        loss, press, targets = model.step(batch)
        num_correct += (press == targets).float().sum()
        total_seen += targets.size(0) 
    return num_correct / total_seen

task_performance = []
for task in range(num_tasks):
    model.set_model_task(model, task)
    datamodule.update_task(task)
    acc1 = evaluate(model, datamodule.test_dataloader(), 0)
    task_performance.append(acc1.item())
print("Per task performance")
for t in range(5):
    print(f"Task {t}: {task_performance[t]:.4f}")

I am wondering if there is a way I can do this by using either the trainer.test() or the trainer.predict() functions

Second, For the method (2), I was browsing the lightening github and found this #9636. Here they talk about calling the fit on the same trainer object multiple times sequentially with different data loaders (in my case updated data loaders). Can this be done instead of getting a new trainer each time in method (2)? I am assuming when I call trainer.fit() the second time it just resumes but it allows me to change the dataloader state in a cleaner way but adding a for loop over the tasks, it makes some other things I might want to do later on simple. For eg, the question above regarding testing can then be done easily by calling trainer.test in a loop.

datamodule = ...
model = ...
trainer = Trainer(max_epochs=max_epochs, ...)
for task in range(num_tasks):
    # update model params
    model.set_model_task(task)
    datamodule.update_task(task)
    trainer.fit(model, datamodule= datamodule)

Also can lightening Loops be used for the use case which I have?

Thank you so much for helping me with this in so much detail.

rohitgr7 Feb 7, 2022

if your test set is the same for all the tasks, you can create checkpoints for each task and reload those checkpoints, and test on your test set.

trainer stores a lot of states such as current_epoch, global_Step, and other stuff. I'd rather suggest injecting the logic inside as I shared above and calling it only once since calling multiple .fit isn't completely supported yet.

yes, you can configure loops as well for your use case. You can check out the K-Fold example in the repo. It's kind of similar to your use case. Like there is new data with every new fold., but all you need to do is keep the model intact and not re-initialize that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Iterating over task for Continual Learning. #11724

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Iterating over task for Continual Learning. #11724

Uh oh!

Uh oh!

prateeky2806 Feb 3, 2022

Replies: 1 comment · 4 replies

Uh oh!

rohitgr7 Feb 3, 2022

Uh oh!

Uh oh!

prateeky2806 Feb 7, 2022 Author

Uh oh!

rohitgr7 Feb 7, 2022

Uh oh!

Uh oh!

prateeky2806 Feb 7, 2022 Author

Uh oh!

rohitgr7 Feb 7, 2022

prateeky2806
Feb 3, 2022

Replies: 1 comment 4 replies

rohitgr7
Feb 3, 2022

prateeky2806 Feb 7, 2022
Author

prateeky2806 Feb 7, 2022
Author