GCN embeddings different on subsequent queries with GPUs #5539

animeshbchowdhury · 2022-09-26T16:32:47Z

animeshbchowdhury
Sep 26, 2022

Hi,

I observed a strange phenomena while studying the graph embeddings generated out of GCN model. After training a model, I collect the learned graph embeddings for graph classification tasks and observed that for the same data point, the embeddings generated mismatch in decimal places by substantial margin (error in the range of 0.1).

Surprisingly, this does not happen when I collect the embedding while inferring on CPU. The embeddings are exactly same. But, with GPU this doesn't happen. To replicate the same, I used one of the PyG tutorial notebook code and observed the same.

`#!/usr/bin/env python

coding: utf-8

In[1]:

Install required packages.

import os
import torch
os.environ['TORCH'] = torch.version
print(torch.version)

In[2]:

import torch
from torch_geometric.datasets import TUDataset

dataset = TUDataset(root='data/TUDataset', name='MUTAG')

print()
print(f'Dataset: {dataset}:')
print('====================')
print(f'Number of graphs: {len(dataset)}')
print(f'Number of features: {dataset.num_features}')
print(f'Number of classes: {dataset.num_classes}')

data = dataset[0] # Get the first graph object.

print()
print(data)
print('=============================================================')

Gather some statistics about the first graph.

print(f'Number of nodes: {data.num_nodes}')
print(f'Number of edges: {data.num_edges}')
print(f'Average node degree: {data.num_edges / data.num_nodes:.2f}')
print(f'Has isolated nodes: {data.has_isolated_nodes()}')
print(f'Has self-loops: {data.has_self_loops()}')
print(f'Is undirected: {data.is_undirected()}')

In[3]:

torch.manual_seed(12345)
dataset = dataset.shuffle()

train_dataset = dataset[:150]
test_dataset = dataset[150:]

print(f'Number of training graphs: {len(train_dataset)}')
print(f'Number of test graphs: {len(test_dataset)}')

In[4]:

from torch_geometric.loader import DataLoader

train_loader = DataLoader(train_dataset, batch_size=64, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=64, shuffle=False)

for step, data in enumerate(train_loader):
print(f'Step {step + 1}:')
print('=======')
print(f'Number of graphs in the current batch: {data.num_graphs}')
print(data)
print()

In[5]:

from torch.nn import Linear
import torch.nn.functional as F
from torch_geometric.nn import GCNConv,BatchNorm
from torch_geometric.nn import global_mean_pool

class GCN(torch.nn.Module):
def init(self, hidden_channels):
super(GCN, self).init()
torch.manual_seed(12345)
self.conv1 = GCNConv(dataset.num_node_features, hidden_channels)
self.norm1 = BatchNorm(hidden_channels)
self.conv2 = GCNConv(hidden_channels, hidden_channels)
self.norm2 = BatchNorm(hidden_channels)
self.conv3 = GCNConv(hidden_channels, hidden_channels)
self.norm3 = BatchNorm(hidden_channels)
self.lin = Linear(hidden_channels, dataset.num_classes)

def forward(self, x, edge_index, batch):
    # 1. Obtain node embeddings 
    x = self.conv1(x, edge_index)
    x = self.norm1(x)
    x = x.relu()
    x = self.conv2(x, edge_index)
    x = self.norm2(x)
    x = x.relu()
    x = self.conv3(x, edge_index)
    x = self.norm3(x)
    # 2. Readout layer
    x = global_mean_pool(x, batch)  # [batch_size, hidden_channels]
    embedding = x

    # 3. Apply a final classifier
    #x = F.dropout(x, p=0.5, training=self.training)
    x = self.lin(x)
    
    return x,embedding

model = GCN(hidden_channels=64)
print(model)

In[6]:

#from IPython.display import Javascript
#display(Javascript('''google.colab.output.setIframeHeight(0, true, {maxHeight: 300})'''))

#model = GCN(hidden_channels=64)
def train(model):
model.train()

for data in train_loader:  # Iterate in batches over the training dataset.
    data = data.to('cuda')
    out,_ = model(data.x, data.edge_index, data.batch)  # Perform a single forward pass.
    loss = criterion(out, data.y)  # Compute the loss.
    loss.backward()  # Derive gradients.
    optimizer.step()  # Update parameters based on gradients.
    optimizer.zero_grad()  # Clear gradients.

def test_cpu(model,loader):
model.eval()
model = model.to('cpu')
correct = 0
for data in loader: # Iterate in batches over the training/test dataset.
out,embedding = model(data.x, data.edge_index, data.batch)
pred = out.argmax(dim=1) # Use the class with highest probability.
correct += int((pred == data.y).sum()) # Check against ground-truth labels.
acc = correct / len(loader.dataset)
return acc,embedding # Derive ratio of correct predictions.

def test_gpu(model,loader):
model.eval()
correct = 0
for data in loader: # Iterate in batches over the training/test dataset.
data = data.to('cuda')
out,embedding = model(data.x, data.edge_index, data.batch)
pred = out.argmax(dim=1) # Use the class with highest probability.
correct += int((pred == data.y).sum()) # Check against ground-truth labels.
acc = correct / len(loader.dataset)
return acc,embedding # Derive ratio of correct predictions.

model = model.to('cuda')
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
criterion = torch.nn.CrossEntropyLoss()

for epoch in range(1, 50):
train(model)
train_acc,_ = test_gpu(model,train_loader)
test_acc,_ = test_gpu(model,test_loader)
print(f'Epoch: {epoch:03d}, Train Acc: {train_acc:.4f}, Test Acc: {test_acc:.4f}')

In[7]:

_,embedding_gpu1= test_gpu(model,test_loader)
_,embedding_gpu2= test_gpu(model,test_loader)
_,embedding_cpu1= test_cpu(model,test_loader)
_,embedding_cpu2= test_cpu(model,test_loader)

In[14]:

embedding_gpu2 == embedding_gpu1

tensor([[ True, True, False, ..., True, False, False],
[ True, True, True, ..., True, True, True],
[ True, True, True, ..., False, False, True],
...,
[ True, True, True, ..., True, True, True],
[ True, True, False, ..., True, False, False],
[False, True, True, ..., True, False, True]], device='cuda:0')

In[16]:

embedding_cpu1 == embedding_cpu2

tensor([[True, True, True, ..., True, True, True],
[True, True, True, ..., True, True, True],
[True, True, True, ..., True, True, True],
...,
[True, True, True, ..., True, True, True],
[True, True, True, ..., True, True, True],
[True, True, True, ..., True, True, True]])

`

Can you let me know what is going wrong? I tried seed_everything as well but no luck with that.

Answered by EdisonLeeeee

Sep 27, 2022

Scatter based on edge_index is a non-deterministic operation and would result in numerical instabilities in CUDA. For message passing layers, deterministic aggregation is only guaranteed when using SparseTensor.

View full answer

EdisonLeeeee · 2022-09-27T03:57:29Z

EdisonLeeeee
Sep 27, 2022
Collaborator

Scatter based on edge_index is a non-deterministic operation and would result in numerical instabilities in CUDA. For message passing layers, deterministic aggregation is only guaranteed when using SparseTensor.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GCN embeddings different on subsequent queries with GPUs #5539

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

GCN embeddings different on subsequent queries with GPUs #5539

Uh oh!

animeshbchowdhury Sep 26, 2022

coding: utf-8

In[1]:

Install required packages.

In[2]:

Gather some statistics about the first graph.

In[3]:

In[4]:

In[5]:

In[6]:

In[7]:

In[14]:

In[16]:

Replies: 1 comment

Uh oh!

EdisonLeeeee Sep 27, 2022 Collaborator

animeshbchowdhury
Sep 26, 2022

EdisonLeeeee
Sep 27, 2022
Collaborator