GraphSAINT dataloader in Pytorch Lightning #2499

JeroendenBoef · 2021-04-29T14:58:23Z

JeroendenBoef
Apr 29, 2021

Hi all,

First of all thanks for the amazing graph learning framework.

I am trying to wrap a GraphSAINT randomwalk sampler + the Flickr dataset in pytorch lightning to enable smooth multi-GPU support. I followed the pytorch lightning convention with seperate train/val/test dataloaders (graphsaint randomwalk) but unlike the NeighborSampler used in the pytorch lightning examples, this dataloader does not support node_idx to specify the mask and thus split. My intuition would be to split up the data object into train, validation and test data objects. Would this approach be correct and if so, are there any utils I have overlooked that streamline this splitting?

Additionally, I was wondering if there is a specific reason the NeighborSampler is favoured over the GraphSAINT sampler in the node property prediction examples on the OGB repo.

Thanks a lot!

Answered by rusty1s

Apr 30, 2021

You can split your data into inductive training, validation, and test (sub)-graphs via torch_geometric.utils.subgraph directly in the prepare_data method and before initializing the GraphSAINTSampler. However, please note that GraphSAINT can only make use of sampling during training, in particular because nodes may be sampled more than once during a single epoch. For validation and testing, it's therefore best to operate on the complete graph.

View full answer

rusty1s · 2021-04-30T05:58:35Z

rusty1s
Apr 30, 2021
Maintainer

You can split your data into inductive training, validation, and test (sub)-graphs via torch_geometric.utils.subgraph directly in the prepare_data method and before initializing the GraphSAINTSampler. However, please note that GraphSAINT can only make use of sampling during training, in particular because nodes may be sampled more than once during a single epoch. For validation and testing, it's therefore best to operate on the complete graph.

1 reply

JeroendenBoef Apr 30, 2021
Author

That's exactly what I was looking for! So essentially GraphSAINT is still an efficient way to train GNNs on large graphs but for validation and testing a different dataloader would be required like the NeighborSampler, similar to your GraphSAINT example here https://github.com/snap-stanford/ogb/blob/master/examples/nodeproppred/products/graph_saint.py

Thanks for the added insight.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GraphSAINT dataloader in Pytorch Lightning #2499

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

GraphSAINT dataloader in Pytorch Lightning #2499

Uh oh!

JeroendenBoef Apr 29, 2021

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Apr 30, 2021 Maintainer

Uh oh!

Uh oh!

JeroendenBoef Apr 30, 2021 Author

JeroendenBoef
Apr 29, 2021

Replies: 1 comment 1 reply

rusty1s
Apr 30, 2021
Maintainer

JeroendenBoef Apr 30, 2021
Author