Different results when fisher in EWC is called multiple times ? #808

DragonRed18 · 2021-11-08T15:35:12Z

DragonRed18
Nov 8, 2021

Goodmorning.
Hi have a problem that I am not able to solve.
When using EWC strategy the "compute_importances" method is called to obtain the FIM.
The method is equivalent to the call of a metric i.e. it shouldn't leave any trace behind.
However, I found out that if I call this method multiple times, the Accuracy and FIM of successive tasks will change with respect to when compute_importances was called a single time.

I provide below the code used to obtain the results.

import argparse
import torch
import torch.nn as nn
from torch.nn import CrossEntropyLoss
from torch.optim import Adam, SGD
import pytorch_lightning as pl


from avalanche.benchmarks.classic import PermutedMNIST,SplitMNIST, RotatedMNIST
from avalanche.models import MTSimpleMLP
from avalanche.training.strategies import Naive,EWC
from avalanche.evaluation.metrics import forgetting_metrics, accuracy_metrics
from avalanche.logging import InteractiveLogger
from avalanche.training.plugins import EvaluationPlugin

# Config
device = torch.device(f"cuda:0")

import torch
import torch.nn as nn
from torch.nn import functional as F
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST

I set the seeds for reproducibility:

torch.manual_seed(0)
import random
random.seed(0)
import numpy as np
np.random.seed(0)```


I define a simple MLP model

class MLP_Model(nn.Module):

def __init__(self, num_classes=10, input_size=28*28):
    super(MLP_Model, self).__init__()
    
    self.features = nn.Sequential(
    nn.Linear(input_size, 1000),
    nn.ReLU(inplace=True),
    nn.Linear(1000, 1000),
    nn.ReLU(inplace=True),
    )
    
    self.classifier = nn.Linear(1000, num_classes)
    self._input_size = input_size
    
    self.train_accuracy = pl.metrics.Accuracy()

def forward(self, x):
    x = x.contiguous()
    x = x.view(x.size(0), self._input_size).float()
    x = self.features(x).float()
    x = self.classifier(x).float()
    x = F.log_softmax(x, dim=1)
    return x  ```

While the following is the main code:

model = MLP_Model()

from torchsummary import summary
input_size= (1,28,28)
print(f"input_size: {input_size}")
summary(model, input_size, device="cpu") 

# CL Benchmark Creation
scenario = RotatedMNIST(n_experiences=2, seed=0) 
train_stream = scenario.train_stream
test_stream = scenario.test_stream

# Prepare for training & testing
optimizer = Adam(model.parameters(), lr=0.001)
criterion = CrossEntropyLoss()

# choose some metrics and evaluation method
interactive_logger = InteractiveLogger()

eval_plugin = EvaluationPlugin(
    accuracy_metrics(
        minibatch=False, epoch=True, experience=True, stream=True),
    forgetting_metrics(experience=True),
    loggers=[interactive_logger])

# Choose a CL strategy
strategy = EWC(
    model=model, optimizer=optimizer, criterion=criterion,
    train_mb_size=64, train_epochs=2, eval_mb_size=64, device=device,
    evaluator=eval_plugin , ewc_lambda=100) #, , ewc_lambda=100

# train and test loop
for train_task in train_stream:
    strategy.train(train_task)
    strategy.eval(test_stream)

I observed that calling multiple time the evaluation of fisher change the final results.
This can be done by editing in the source code

In ewc.py in after_training_exp method, instead of compute_importances we call multiple_calls_compute_importances method.
This method will only do multiple calls to compute_importances method:

   def multiple_calls_compute_importances(self, model, criterion, optimizer,dataset, device, batch_size):
       n=6
       for i in range(n-1):  
           self.compute_importances(model, criterion, optimizer,dataset, device, batch_size)

       return self.compute_importances(model, criterion, optimizer,dataset, device, batch_size)

The surprising thing is that changing n=1(fisher is called only once as in the normal ewc) to n=6(fisher is called 6 times) the final results are different.

This can be observed not only in the final accuracy but also looking to the trace of the FIM.
For this is sufficient to do a little edit in the compute_importances method:

   def compute_importances_sub(self, model, criterion, optimizer,
                           dataset, device, batch_size):
       """
       Compute EWC importance matrix for each parameter
       """

       model.eval()

       # list of list
       importances = zerolike_params_dict(model)
       dataloader = DataLoader(dataset, batch_size=batch_size)
       for i, (x, y, task_labels) in enumerate(dataloader):
           x, y = x.to(device), y.to(device)

           optimizer.zero_grad()
           out = avalanche_forward(model, x, task_labels)
           loss = criterion(out, y)
           loss.backward()

           for (k1, p), (k2, imp) in zip(model.named_parameters(),
                                         importances):
               assert (k1 == k2)
               if p.grad is not None:
                   imp += p.grad.data.clone().pow(2)

       # average over mini batch length
       for _, imp in importances:
           imp /= float(len(dataloader))

       sum_imp = 0
       for _, imp in importances:
           sum_imp+=imp.sum().cpu()

       print(f"sum_imp: {sum_imp}")

       return importances

It is equal to the original one but now is also calculated the trace in the sum_imp parameter.

Let's call:

when fisher is called once
when fisher is called multiple times

From the trace of fisher of second task we can observe different results:

has a trace value of 0.699 while 2) has 1.199
Moreover, the final accuracy in 1) is for Task 0: 0.3209 and for Task 1: 0.9750
Instead the final accuracy in 2) is for Task 0: 0.3055 and for Task 1: 0.9671.

It would be very interesting and useful to understand why there is so much difference due to the call of a method that shouldn't have left any trace behind.

AndreaCossu · 2021-11-08T15:59:34Z

AndreaCossu
Nov 8, 2021
Maintainer

While we are looking more into this, can you try by adding shuffle=False to the Dataloader? Just a sanity check.

2 replies

DragonRed18 Nov 9, 2021
Author

Hi @AndreaCossu thank you.

I tried to set shuffle=False in the DataLoader in compute_importances method.
I didn't observe changes.

vlomonaco Nov 11, 2021
Maintainer

This is very strange, thanks @DragonRed18 for reporting this (funny) issue. Can you do a sanity check and verify if in the main loop of the compute_importances_sub the same data (x, y, task_labels) are processed? I suspect the data loader is still providing different data every time (maybe transformations are in play here?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Different results when fisher in EWC is called multiple times ? #808

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Different results when fisher in EWC is called multiple times ? #808

Uh oh!

DragonRed18 Nov 8, 2021

Replies: 1 comment · 2 replies

Uh oh!

AndreaCossu Nov 8, 2021 Maintainer

Uh oh!

DragonRed18 Nov 9, 2021 Author

Uh oh!

vlomonaco Nov 11, 2021 Maintainer

DragonRed18
Nov 8, 2021

Replies: 1 comment 2 replies

AndreaCossu
Nov 8, 2021
Maintainer

DragonRed18 Nov 9, 2021
Author

vlomonaco Nov 11, 2021
Maintainer