Replies: 1 comment 5 replies
-
I think in general you want to avoid saving too many small files, as I/O will become a bottleneck. However, you already may find success in increasing the number of workers The splitting also depends on your model. If your model makes use of temporal information, I think it's best to store all timesteps into a single |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Let me first thank you for all the great work behind
pytorch_geometric
.I'm dealing with several temporal graphs that represent numerical simulations of physical systems. I have between tens and thousands of graph topology (that are actually meshes). For each topology, I have hundreds of time steps. Eventually, this yields to a dataset of several Gb, that cannot fit fully in memory.
The documentation indicates how to handle such large datasets with the
torch_geometric.data.Dataset
class. I decided to process and save a.pt
file for each topology. Given an index, theget
method will figure which topology and time step it corresponds to. It will then read the relevant.pt
file and access the appropriate time step. Finally, it will return aData
object with all necessary inputs and targets.The training on such dataset is naturally slower compared to
InMemoryDataset
. Especially, when the batch increases the loading of the data is much slower. However, I was wondering what were the best practices to make the access of the data as fast as possible. What are the general rules of thumb in this case? What would be the best splits in terms of saved files, topology, and time step? A processed file per time step would lead to smaller but numerous files. Grouped several topology and time step would lead to fewer but bigger files. Is there also any consideration to have regarding the number of workers, or is it simply the more the faster? How much processed information must be saved in.pt
or computed on the fly?Beta Was this translation helpful? Give feedback.
All reactions