You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am having a problem when using TemporalData class. The data cannot be batched using PyG Dataloader if the edge_index of the objects are not the same. This does not happen if the object class is replaced with Data. The idea behind the work is that the data is not singular graphs, but a snapshot of graphs, that have varying edge_indices across time.
Here is an example to reproduce the problem.
from torch_geometric.data import Data, Batch
from torch_geometric.data import Dataset as PyGDataset
from torch_geometric.loader import DataLoader
from torch_geometric.data import TemporalData
import torch
def create_specific_temporal_data(num_nodes, num_features, num_edges):
t = torch.rand(num_nodes) # Random timestamps for nodes, shape [num_nodes]
x = torch.randn(num_nodes, num_features) # Node features, shape [num_nodes, num_features]
edge_index = torch.randint(0, num_nodes, (2, num_edges), dtype=torch.long) # Edge indices, shape [2, num_edges]
edge_attr = torch.randn(num_edges) # Edge attributes, shape [num_edges]
edge_time = torch.randn(num_edges) # Additional edge attributes, shape [num_edges]
y = torch.tensor([1]) # Label, shape [1]
return TemporalData(t=t, x=x, edge_index=edge_index, edge_attr=edge_attr, edge_time=edge_time, y=y)
# Create four TemporalData objects with specified shapes
temporal_data_list = [
create_specific_temporal_data(190, 100, 910),
create_specific_temporal_data(228, 100, 1092),
create_specific_temporal_data(228, 100, 1092),
create_specific_temporal_data(228, 100, 1092)
]
# Print the details of each TemporalData object
for i, data in enumerate(temporal_data_list):
print(f"t={data.t.shape}, x={data.x.shape}, edge_index={data.edge_index.shape}, edge_attr={data.edge_attr.shape}, edge_time={data.edge_time.shape}, y={data.y.shape}")
class TemporalDataset(PyGDataset):
def __init__(self, data_list):
super(TemporalDataset, self).__init__()
self.data_list = data_list
def len(self):
return len(self.data_list)
def get(self, idx):
return self.data_list[idx]
custom_dataset = TemporalDataset(temporal_data_list)
for data in custom_dataset:
print(data)
loader = DataLoader(custom_dataset, batch_size=2, shuffle=False, follow_batch=['edge_index', 'edge_attr', 'edge_time'])
for batch in loader:
print(batch)
I get the error RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 910 but got size 1092 for tensor number 1 in the list..
In this example, we have a graph what has 19 nodes and 91 edges. We have 12 graphs in total generating 228 nodes and 1092 edges, where each graph can be separately accessed utilizing the t and edge_time attributes.
The problem in the discrepancy in the sizes that some sequences may be shorter than the others (less graphs in the snapshot due to lack of data).
How to possibly mitigate this problem? Is it by using the Data object instead of the TemporalData? or always set batch_size=1, which is inefficient during model training!
Is it assumed that all TemporalData objects in a batch shall have the same number of graphs per snapshot?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I am having a problem when using
TemporalData
class. The data cannot be batched using PyG Dataloader if the edge_index of the objects are not the same. This does not happen if the object class is replaced withData
. The idea behind the work is that the data is not singular graphs, but a snapshot of graphs, that have varying edge_indices across time.Here is an example to reproduce the problem.
I get the error
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 910 but got size 1092 for tensor number 1 in the list.
.Data:
In this example, we have a graph what has 19 nodes and 91 edges. We have 12 graphs in total generating 228 nodes and 1092 edges, where each graph can be separately accessed utilizing the
t
andedge_time
attributes.The problem in the discrepancy in the sizes that some sequences may be shorter than the others (less graphs in the snapshot due to lack of data).
How to possibly mitigate this problem? Is it by using the Data object instead of the
TemporalData
? or always set batch_size=1, which is inefficient during model training!Is it assumed that all TemporalData objects in a batch shall have the same number of graphs per snapshot?
Any help will be appreciated.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions