TemporalData Class cannot be batched using Pytorch Geometric DataLoader #9221

MohSamNaf · 2024-04-21T00:12:14Z

MohSamNaf
Apr 21, 2024

Hi,

I am having a problem when using TemporalData class. The data cannot be batched using PyG Dataloader if the edge_index of the objects are not the same. This does not happen if the object class is replaced with Data. The idea behind the work is that the data is not singular graphs, but a snapshot of graphs, that have varying edge_indices across time.

Here is an example to reproduce the problem.

from torch_geometric.data import Data, Batch
from torch_geometric.data import Dataset as PyGDataset
from torch_geometric.loader import DataLoader
from torch_geometric.data import TemporalData
import torch

def create_specific_temporal_data(num_nodes, num_features, num_edges):
    t = torch.rand(num_nodes)  # Random timestamps for nodes, shape [num_nodes]
    x = torch.randn(num_nodes, num_features)  # Node features, shape [num_nodes, num_features]
    edge_index = torch.randint(0, num_nodes, (2, num_edges), dtype=torch.long)  # Edge indices, shape [2, num_edges]
    edge_attr = torch.randn(num_edges)  # Edge attributes, shape [num_edges]
    edge_time = torch.randn(num_edges)  # Additional edge attributes, shape [num_edges]
    y = torch.tensor([1])  # Label, shape [1]

    return TemporalData(t=t, x=x, edge_index=edge_index, edge_attr=edge_attr, edge_time=edge_time, y=y)

# Create four TemporalData objects with specified shapes
temporal_data_list = [
    create_specific_temporal_data(190, 100, 910),
    create_specific_temporal_data(228, 100, 1092),
    create_specific_temporal_data(228, 100, 1092),
    create_specific_temporal_data(228, 100, 1092)
]

# Print the details of each TemporalData object
for i, data in enumerate(temporal_data_list):
    print(f"t={data.t.shape}, x={data.x.shape}, edge_index={data.edge_index.shape}, edge_attr={data.edge_attr.shape}, edge_time={data.edge_time.shape}, y={data.y.shape}")

class TemporalDataset(PyGDataset):
    def __init__(self, data_list):
        super(TemporalDataset, self).__init__()

        self.data_list = data_list
        
            
    def len(self):

        return len(self.data_list)
        
    def get(self, idx):

        return self.data_list[idx]

custom_dataset = TemporalDataset(temporal_data_list)
for data in custom_dataset:
    print(data)

loader = DataLoader(custom_dataset, batch_size=2, shuffle=False, follow_batch=['edge_index', 'edge_attr', 'edge_time'])
for batch in loader:
    print(batch)

I get the error RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 910 but got size 1092 for tensor number 1 in the list..

Data:

t=torch.Size([190]), x=torch.Size([190, 100]), edge_index=torch.Size([2, 910]), edge_attr=torch.Size([910]), edge_time=torch.Size([910]), y=torch.Size([1])
t=torch.Size([228]), x=torch.Size([228, 100]), edge_index=torch.Size([2, 1092]), edge_attr=torch.Size([1092]), edge_time=torch.Size([1092]), y=torch.Size([1])
t=torch.Size([228]), x=torch.Size([228, 100]), edge_index=torch.Size([2, 1092]), edge_attr=torch.Size([1092]), edge_time=torch.Size([1092]), y=torch.Size([1])
t=torch.Size([228]), x=torch.Size([228, 100]), edge_index=torch.Size([2, 1092]), edge_attr=torch.Size([1092]), edge_time=torch.Size([1092]), y=torch.Size([1])

In this example, we have a graph what has 19 nodes and 91 edges. We have 12 graphs in total generating 228 nodes and 1092 edges, where each graph can be separately accessed utilizing the t and edge_time attributes.

The problem in the discrepancy in the sizes that some sequences may be shorter than the others (less graphs in the snapshot due to lack of data).

How to possibly mitigate this problem? Is it by using the Data object instead of the TemporalData? or always set batch_size=1, which is inefficient during model training!
Is it assumed that all TemporalData objects in a batch shall have the same number of graphs per snapshot?

Any help will be appreciated.
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TemporalData Class cannot be batched using Pytorch Geometric DataLoader #9221

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

TemporalData Class cannot be batched using Pytorch Geometric DataLoader #9221

Uh oh!

Uh oh!

MohSamNaf Apr 21, 2024

Replies: 0 comments

MohSamNaf
Apr 21, 2024