How to generate datasets for inductive learning? #3415

scottshufe · 2021-10-31T15:53:14Z

scottshufe
Oct 31, 2021

Hi, everyone. My question is how to generate datasets for inductive learning?
Using Cora dataset as an example, I want to use half of its nodes and their links for training, the rest for testing. I am not sure if this approach is correct:

randomly sample half of nodes on the graph
use the sampled nodes to generate subgraph(1) with torch_geometric.utils.subgraph function, the rest nodes for subgraph(2)
use the generated subgraph(1) as the training dataset and subgraph(2) for testing

Does anyone have better ideas?

Answered by rusty1s

Nov 2, 2021

train_mask = torch.rand(data.num_nodes) < 0.5
test_mask = ~train_mask

train_data = copy.copy(data)
train_data.edge_index, _ = subgraph(train_mask, data.edge_index, relabel_nodes=True)
train_data.x = data.x[train_mask]

test_data = copy.copy(data)
test_data.edge_index, _ = subgraph(test_mask, data.edge_index, relabel_nodes=True)
test_data.x = data.x[test_mask]

View full answer

wsad1 · 2021-11-02T06:29:32Z

wsad1
Nov 2, 2021
Maintainer

For datasets like cora I have mostly only seen examples of transductive learning (nodes in same graph are split into train/test. although this is not completely transductive as there is nothing stopping you from predicting on an unseen node) as done here.

The approach you are suggesting sounds interesting, but because you are generating subgraphs, the neighborhood of a node might change, which might effect your prediction.

1 reply

scottshufe Nov 2, 2021
Author

Got it. To make it simple, I just want to split a graph into two parts.
Thanks for your suggestion. :)

rusty1s · 2021-11-02T07:51:46Z

rusty1s
Nov 2, 2021
Maintainer

train_mask = torch.rand(data.num_nodes) < 0.5
test_mask = ~train_mask

train_data = copy.copy(data)
train_data.edge_index, _ = subgraph(train_mask, data.edge_index, relabel_nodes=True)
train_data.x = data.x[train_mask]

test_data = copy.copy(data)
test_data.edge_index, _ = subgraph(test_mask, data.edge_index, relabel_nodes=True)
test_data.x = data.x[test_mask]

1 reply

scottshufe Nov 2, 2021
Author

Thank you Matthias! I think this is what I need.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to generate datasets for inductive learning? #3415

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

How to generate datasets for inductive learning? #3415

Uh oh!

scottshufe Oct 31, 2021

Replies: 2 comments · 2 replies

Uh oh!

wsad1 Nov 2, 2021 Maintainer

Uh oh!

scottshufe Nov 2, 2021 Author

Uh oh!

rusty1s Nov 2, 2021 Maintainer

Uh oh!

Uh oh!

scottshufe Nov 2, 2021 Author

scottshufe
Oct 31, 2021

Replies: 2 comments 2 replies

wsad1
Nov 2, 2021
Maintainer

scottshufe Nov 2, 2021
Author

rusty1s
Nov 2, 2021
Maintainer

scottshufe Nov 2, 2021
Author