Replies: 1 comment
-
I actually fixed it by modifying my data, so I am able to store it in memory now, very fast training now. The question is not relevant anymore, unless someone still wants to answer how to optimally store a lot of graphs :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have 6700 graphs of different sizes. All the graphs can't be in memory, so I save each graph as a .pt file on the disk and then create a Dataset class. The problem is that the data loads extremely slow (~10 mins per epoch) when using a data loader (10 workers and a batch size of 150). It's quite fast when only using a subset of 1000 graphs (~15 secs per epoch), but when using the full dataset of 6700 .pt files it becomes very slow. I guess it's due to some overhead, maybe how I store the graphs in 6700 individual files, do any of you know a more efficient way to store the data when it is data objects containing graphs of different sizes? Perhaps I am misunderstanding something? The dataset code can be seen below. Thanks a lot and appreciate all the great work done with Pytorch Geometric!
Beta Was this translation helpful? Give feedback.
All reactions