On the fly graph generation from tabular data #4300

AlejandroTL · 2022-03-18T13:34:01Z

AlejandroTL
Mar 18, 2022

Hello all!

I have a question regarding the generation of data, data loading and so on. Basically, my problem is that I want to create graphs from tabular data on the fly.

Currently, Pytorch Geometric enable two types of datasets, InMemory and standard datasets. But both of them generate the whole dataset. What I want to is to generate the graphs on the fly when needed for the batch. I've seen that there's also an IterableDataset class from Pytorch but I think that's not exactly what I'm looking for. I want to sequentially loop over the rows of my tabular data and generate a new graph each time it's needed for filling up a batch and feed it to some GNN. I don't know if I'm explaining myself properly.

I don't want to store nor in memory or disk thousands of graphs, I want just to have stored in memory the graphs needed at each time to fill up a batch.

Does anyone have an any idea about how should I do it? I don't find the way. Actually, the only way I find is to generate a new Dataset per each batch, feed the entire dataset and generate a new one for the next batch.

Thank you!

Answered by rusty1s

Mar 19, 2022

In general, you do not need to store your graphs on disk and can easily perform graph generation on-the-fly. For this, you can either implement your own version of torch.utils.data.Dataset and perform conversion to PyG's graph data in __getitem__ on the fly. For streaming data, IterableDataset should work as well. Another option is to look into the newly released torch-data project which implements flexible and customizable data loading functionality via data pipelines. I am currently working on an example to show-case its use directly in PyG :)

View full answer

rusty1s · 2022-03-19T15:08:08Z

rusty1s
Mar 19, 2022
Maintainer

In general, you do not need to store your graphs on disk and can easily perform graph generation on-the-fly. For this, you can either implement your own version of torch.utils.data.Dataset and perform conversion to PyG's graph data in __getitem__ on the fly. For streaming data, IterableDataset should work as well. Another option is to look into the newly released torch-data project which implements flexible and customizable data loading functionality via data pipelines. I am currently working on an example to show-case its use directly in PyG :)

1 reply

AlejandroTL Mar 22, 2022
Author

Hello!

Thank you for your answer. It works! I have build a torch.utils.data.Dataset and I'm using a geometric.dataloader to load the data. If I use a torch.utils.data.dataloader, it outputs a error because it cannot handle geometric objects. Thank you a lot!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

On the fly graph generation from tabular data #4300

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

On the fly graph generation from tabular data #4300

Uh oh!

AlejandroTL Mar 18, 2022

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Mar 19, 2022 Maintainer

Uh oh!

AlejandroTL Mar 22, 2022 Author

AlejandroTL
Mar 18, 2022

Replies: 1 comment 1 reply

rusty1s
Mar 19, 2022
Maintainer

AlejandroTL Mar 22, 2022
Author