Most efficient way to store spatio-temporal graph data? #3927
Unanswered
radandreicristian
asked this question in
Q&A
Replies: 1 comment 1 reply
-
I thought about the following approach: In the process function, save the edge index and edge attributes once, and then save each data point (out of the 30K) as Data(x=..., y=...).
Would that have any less expected side effects with regard to the composition / functionality of the data object, or should it work as intended? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello!
I am looking to contribute to PyG by adding a spatio-temporal graph dataset which I will use in one of my projects. The task is a node-wise time series regression (i.e. given P previous states of the feature vector X in every node, predict the next N states of the feature vector in every node).
The data tensors can be described as follows:
What worries me is the first dimension. I can construct ~30K intervals, and if the previous and next steps and >10, there are already GBs of memory in use. My first thought was using "InMemoryDataset", but then I went ahead and subclasses the "Dataset" class and stored them on disk.
When storing with "Dataset", however, the edge indices and edge attributes must be stored along with each graph, but they are essentially the same. I would end up storing 30K times the same (weighted) adjacency matrix, which is something I would like to avoid (even with COO sparse tensors).
Is there a "better" practice while storing such data?
Thanks, have a great day!
Beta Was this translation helpful? Give feedback.
All reactions