Most efficient way to store spatio-temporal graph data? #3927

radandreicristian · 2022-01-24T10:34:50Z

radandreicristian
Jan 24, 2022

Hello!

I am looking to contribute to PyG by adding a spatio-temporal graph dataset which I will use in one of my projects. The task is a node-wise time series regression (i.e. given P previous states of the feature vector X in every node, predict the next N states of the feature vector in every node).

The data tensors can be described as follows:

X (n_intervals, n_previous_steps, n_nodes, n_features), essentially (..., 207, 1).
Y (n_intervals, n_next, n_nodes, n_features), essentially (..., 207, 1).

What worries me is the first dimension. I can construct ~30K intervals, and if the previous and next steps and >10, there are already GBs of memory in use. My first thought was using "InMemoryDataset", but then I went ahead and subclasses the "Dataset" class and stored them on disk.

When storing with "Dataset", however, the edge indices and edge attributes must be stored along with each graph, but they are essentially the same. I would end up storing 30K times the same (weighted) adjacency matrix, which is something I would like to avoid (even with COO sparse tensors).

Is there a "better" practice while storing such data?

Thanks, have a great day!

radandreicristian · 2022-01-25T10:16:28Z

radandreicristian
Jan 25, 2022
Author

I thought about the following approach:

In the process function, save the edge index and edge attributes once, and then save each data point (out of the 30K) as Data(x=..., y=...).
In the get function, use torch.load for the edge index, edge attributes and the data object, then do:

data.edge_index = edge_index
data.edge_attr = edge_attr

Would that have any less expected side effects with regard to the composition / functionality of the data object, or should it work as intended?

1 reply

rusty1s Jan 26, 2022
Maintainer

I'm sorry for the late reply. This approach is exactly what I would suggest as well.

As far as I can tell from your description, X and Y may hold duplicated data as well, so it might be also a good idea to investigate how one can get rid of this as well. Eventually, one can create the final Data item completely on-the-fly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Most efficient way to store spatio-temporal graph data? #3927

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Most efficient way to store spatio-temporal graph data? #3927

Uh oh!

radandreicristian Jan 24, 2022

Replies: 1 comment · 1 reply

Uh oh!

radandreicristian Jan 25, 2022 Author

Uh oh!

Uh oh!

rusty1s Jan 26, 2022 Maintainer

radandreicristian
Jan 24, 2022

Replies: 1 comment 1 reply

radandreicristian
Jan 25, 2022
Author

rusty1s Jan 26, 2022
Maintainer