On the fly graph generation from tabular data #4300
-
Hello all! I have a question regarding the generation of data, data loading and so on. Basically, my problem is that I want to create graphs from tabular data on the fly. Currently, Pytorch Geometric enable two types of datasets, InMemory and standard datasets. But both of them generate the whole dataset. What I want to is to generate the graphs on the fly when needed for the batch. I've seen that there's also an IterableDataset class from Pytorch but I think that's not exactly what I'm looking for. I want to sequentially loop over the rows of my tabular data and generate a new graph each time it's needed for filling up a batch and feed it to some GNN. I don't know if I'm explaining myself properly. I don't want to store nor in memory or disk thousands of graphs, I want just to have stored in memory the graphs needed at each time to fill up a batch. Does anyone have an any idea about how should I do it? I don't find the way. Actually, the only way I find is to generate a new Dataset per each batch, feed the entire dataset and generate a new one for the next batch. Thank you! |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
In general, you do not need to store your graphs on disk and can easily perform graph generation on-the-fly. For this, you can either implement your own version of |
Beta Was this translation helpful? Give feedback.
In general, you do not need to store your graphs on disk and can easily perform graph generation on-the-fly. For this, you can either implement your own version of
torch.utils.data.Dataset
and perform conversion to PyG's graphdata
in__getitem__
on the fly. For streaming data,IterableDataset
should work as well. Another option is to look into the newly releasedtorch-data
project which implements flexible and customizable data loading functionality via data pipelines. I am currently working on an example to show-case its use directly in PyG :)