DataLoader shuffle affects loss and accuracy on evaluation #7450

SomgBird · 2023-05-27T16:54:33Z

SomgBird
May 27, 2023

I am working on node classification model with ChebConv and using the loader as following:

test_loader = DataLoader(test_data, batch_size=batch_size, shuffle=True)

Accuracy function is:

def compute_correct(pred, expected, size, batch):
    def check(pred_i, exp_i):
        for p, e in zip(pred_i, exp_i):
            if p != e:
                return False
        return True
    
    correct = 0
    wrong = 0
    
    for i in range(0, size):
        pred_i = pred[batch==i]
        exp_i = expected[batch==i]
        
        if check(pred_i, exp_i):
            correct += 1
        else:
            wrong += 1
            
    return correct, wrong

The result is considered as correct only if all nodes of specific graph are classified correctly (it is important in my problem), so correct + wrong is always equals to the number of graphs in the batch.

I noticed that with shuffle=True total loss and accuracy for test data are different with every execution. It occurs only with shuffle=True and batch size greater than 1. Passing through all test data one sample by one (even shuffled) or by one huge batch does not provoke that issue. Also, model returns same result for any individual data sample.

model.eval() is called, all dropout are set to 0, no data modification between execution are performed.

My model:

class ChebNetLinear(nn.Module):
    def __init__(
            self,
            in_channels : int,
            hidden_channels : int,
            out_channels : int,
            K : int,
            cheb_depth : int,
            linear_depth : int,
            linear_channels : int,
            normalization : Optional[str] = None,
            bias : Optional[bool] = True,
            droprate : Optional[float] = 0.0):
        super(ChebNetLinear, self).__init__()
        
       # create ChebConv layers
        self._convs = create_chebnet_convolutions(
            in_channels, 
            hidden_channels, 
            hidden_channels, 
            K, 
            cheb_depth,
            normalization,
            bias,
            droprate)
        
       # create Linear layers
        self._linear = create_linear_layers(
            hidden_channels, 
            linear_channels, 
            out_channels, 
            linear_depth)
        
        self._relu = nn.ReLU()
        self._dropout = nn.Dropout(p=droprate)
        self._logsoftmax = nn.LogSoftmax(dim=1)
    
    
    def forward(self, 
                x, 
                edge_index, 
                edge_weight, 
                batch : Optional[Tensor] = None):
        for i in range(len(self._convs)):
            x = self._convs[i](x, edge_index, edge_weight, batch)
            x = self._relu(x)
        x = self._dropout(x)
        
        for i in range(len(self._linear) - 1):
            x = self._linear[i](x)
            x = self._relu(x)
        x = self._linear[-1](x)
        x = self._logsoftmax(x)
        
        return x

Test function:

def test(loader):
    total_loss = 0
    total_correct = 0
    total_wrong = 0
    
    with torch.no_grad():
        model.eval()
        for data in loader:
            data = data.to(device)
            y = model(
                data.x, 
                data.edge_index, 
                data.edge_weight,
                data.batch)
            
            loss = criterion(y, data.y.argmax(dim=1))
            total_loss += loss.item() * data.num_graphs
            pred = y.argmax(dim=1)
            expected = data.y.argmax(dim=1)
            c,w = compute_correct(pred, expected, data.num_graphs, data.batch)
            total_correct += c
            total_wrong += w
            
    return total_loss / len(test_data), total_correct / (total_correct + total_wrong)

Is this normal? I cannot find any error in my code. As far as i know, test accuracy and loss should be the same on different execution with or without shuffle.

Answered by rusty1s

Jun 1, 2023

Your total loss is not correctly calculated. It should be

total_loss += loss.item() * data.num_nodes
total_examples += data.num_nodes

return total_loss / total_examples

in case your graphs are differently sized. This fixes one of the issues for me.

The other issue is related to lambda_max. Since you are using None normalization, automatic lambda_max inferral may be different depending on the batch you have sampled since by default we compute it via 2 * edge_weight.max().

x = self._conv1(x, edge_index, edge_weight, batch, lambda_max=4)
x = self._relu(x)
x = self._conv2(x, edge_index, edge_weight, batch, lambda_max=4)

fixes this for me.

Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: …

View full answer

wsad1 · 2023-05-30T04:49:56Z

wsad1
May 30, 2023
Maintainer

How many epochs are you training the model for. If you train for several epochs, the difference in loss across runs should be negligible.

13 replies

SomgBird Jun 1, 2023
Author

Yes, edge_weight.max() < 1. Weights are normalized via torch.nn.functional.normalize(edge_weight, p=1.0, dim=0) for each graph individually, and using CPU provides same fluctuations.

SomgBird Jun 1, 2023
Author

I modified your test and made simplified version of my code. My test output is:

Test Loss: 0.00054512
Test Loss: 0.00051234
Test Loss: 0.00051677
Test Loss: 0.00049712
Test Loss: 0.00052133
Test Loss: 0.00053078
Test Loss: 0.00052411
Test Loss: 0.00051355
Test Loss: 0.00053958
Test Loss: 0.00052620

The code:

import torch
import torch.nn as nn
from torch_geometric.data import Data
from torch_geometric.nn import ChebConv
from torch_geometric.loader import DataLoader



class ChebNet(nn.Module):
    def __init__(self,
                 in_channels,
                 hidden_channels,
                 out_channels):
        super(ChebNet, self).__init__()
        
        self._conv1 = ChebConv(in_channels, hidden_channels, 3, None, True)
        self._conv2 = ChebConv(hidden_channels, out_channels, 3, None, True)
        self._relu = nn.ReLU()
        self._logsoftmax = nn.LogSoftmax(dim=1)
    
    
    def forward(self, 
                x, 
                edge_index, 
                edge_weight, 
                batch = None):
        x = self._conv1(x, edge_index, edge_weight, batch)
        x = self._relu(x)
        x = self._conv2(x, edge_index, edge_weight, batch)
        x = self._logsoftmax(x)
        
        return x


def create_data_set(N):
    dataset = []
    
    for i in range(0, N):
        x = torch.randn(4, 1)
        y = torch.randint(0, 2, (4,1))
        edge_index = torch.tensor([[0, 1, 1, 2, 2, 3], [1, 0, 2, 1, 3, 2]])
        edge_weight = torch.rand(edge_index.size(1))
        data = Data(x=x, edge_index=edge_index, edge_weight=edge_weight, y=y)
        dataset.append(data)
    
    return dataset

device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')

dataset = create_data_set(500)
loader = DataLoader(dataset, batch_size=32, shuffle=True)


model = ChebNet(1, 64, 2).to(device)
criterion = nn.NLLLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01, weight_decay=5e-4)


def train(loader):
    model.train()
    total_loss = 0
    
    for data in loader:
        optimizer.zero_grad()
        
        data = data.to(device)
        y = model(
            data.x, 
            data.edge_index, 
            data.edge_weight,
            data.batch)
        
        loss = criterion(y, data.y.argmax(dim=1))
        loss.backward()
        optimizer.step()
        total_loss += loss.item() * data.num_graphs
    
    return total_loss / len(dataset)


def test(loader):
    total_loss = 0
    
    with torch.no_grad():
        model.eval()
        for data in loader:
            data = data.to(device)
            y = model(
                data.x, 
                data.edge_index, 
                data.edge_weight,
                data.batch)
            
            loss = criterion(y, data.y.argmax(dim=1))
            total_loss += loss.item() * data.num_graphs
            
    return total_loss / len(dataset)

# TRAINING
for epoch in range(10):
    train_loss = train(loader)
    print('Train L: {:.4f}'.format(train_loss))
print("Train finnished.\n\n")
    
# TESTING
for i in range(0, 10):
    test_loss = test(loader)
    print('Test Loss: {:.8f}'.format(test_loss))

SomgBird Jun 1, 2023
Author

Updated code above, there was a mistake in return total_loss / len(dataset). There was len(data) instead of len(dataset). After this fix the problem is still there, and there is no such mistake in my main code.

rusty1s Jun 1, 2023
Maintainer

Your total loss is not correctly calculated. It should be

total_loss += loss.item() * data.num_nodes
total_examples += data.num_nodes

return total_loss / total_examples

in case your graphs are differently sized. This fixes one of the issues for me.

The other issue is related to lambda_max. Since you are using None normalization, automatic lambda_max inferral may be different depending on the batch you have sampled since by default we compute it via 2 * edge_weight.max().

x = self._conv1(x, edge_index, edge_weight, batch, lambda_max=4)
x = self._relu(x)
x = self._conv2(x, edge_index, edge_weight, batch, lambda_max=4)

fixes this for me.

Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314
Test Loss: 0.00046314

Answer selected by SomgBird

SomgBird Jun 1, 2023
Author

Using fixed lambda_max for all graph + normalized weights + correct loss computation solved the problem. Thank you very much for your time and attention.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DataLoader shuffle affects loss and accuracy on evaluation #7450

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 13 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

DataLoader shuffle affects loss and accuracy on evaluation #7450

Uh oh!

SomgBird May 27, 2023

Replies: 1 comment · 13 replies

Uh oh!

wsad1 May 30, 2023 Maintainer

Uh oh!

SomgBird Jun 1, 2023 Author

Uh oh!

Uh oh!

SomgBird Jun 1, 2023 Author

Uh oh!

SomgBird Jun 1, 2023 Author

Uh oh!

Uh oh!

rusty1s Jun 1, 2023 Maintainer

Uh oh!

SomgBird Jun 1, 2023 Author

SomgBird
May 27, 2023

Replies: 1 comment 13 replies

wsad1
May 30, 2023
Maintainer

SomgBird Jun 1, 2023
Author

SomgBird Jun 1, 2023
Author

SomgBird Jun 1, 2023
Author

rusty1s Jun 1, 2023
Maintainer

SomgBird Jun 1, 2023
Author