Optimum strategy to train on very large datasets #4277
diamondspark
started this conversation in
General
Replies: 1 comment
-
Depending on your graph sizes, you can also think about storing your data in batches (rather than individual graphs). This should increase I/O by a huge amount. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi all,
I am trying to train on roughly 2 Million molecular graph data points. What is the best strategy to load data in such a case? I follow the tutorial for large datasets here https://pytorch-geometric.readthedocs.io/en/latest/notes/create_dataset.html
I have the following questions in particular
MyOwnDataset()
whenprocess()
is being skipped. It takes forever to create this object even though all the graphs are precomputed and saved to the memory. What happens under the hood on this call? Any way to make this faster?Kindly advise.
Thank you!
Beta Was this translation helpful? Give feedback.
All reactions