Heterogeneous data random split for GNN #5749

1AngelUS · 2022-10-17T11:43:12Z

1AngelUS
Oct 17, 2022

Hi,
I am implementing a Hetero GNN by following the following link:
https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/hetero_conv_dblp.py

But in my data there are some nodes that don't have any label. Could you please help me as to how to random split the data to form train and test masks and train and test sets respectively.

Currently I am splitting it this way:
n = merc_x.shape[0]
idx0=list(y0[:,0]) #label 0 samples
idx1=list(y1[:,0]) #label 1 samples

random.shuffle(idx0)
random.shuffle(idx1)

train_mask_idx1 = idx1[:4000]
train_mask_idx0 = idx0[:600000]

But the model is giving very poor performance and I think it is because of this splitting strategy.

Thanks very much!

rusty1s · 2022-10-17T18:31:45Z

rusty1s
Oct 17, 2022
Maintainer

The poor performance might be due to the imbalance of labels, it looks like you have a ratio of 1:15.

To understand your code better, how is y0 and y1 defined?

7 replies

1AngelUS Oct 21, 2022
Author

Thanks once again for your kind reply!

I ran my model after replacing SAGEConv layers by MLP and the performance was similar to the lightGBM model as you also suggested.
What shall I do to improve GNN model performance?

I was thinking may be to train the GNN in unsupervised setting. Could you help me with some source for the same.

rusty1s Oct 22, 2022
Maintainer

In that case, I assume there is something wrong with your GNN or your given graph structure. How are your nodes connected to each other? Does the same hold true when using other GNN layers such as GCNConv or GATConv?

1AngelUS Oct 23, 2022
Author

It is bipartite graph so only one type of edges and its a dense network.

I will surely try other GNNs as you suggested and will let you know the results. I would also like to try an unsupervised GNN. Could you please help me with code sample for that?

rusty1s Oct 24, 2022
Maintainer

If you want to perform message passing in a bipartite graph, I assume you need both directions. Otherwise, one set of nodes will never receive any updates.

1AngelUS Oct 24, 2022
Author

Thank you for pointing this out. That could be an issue, I will change my code accordingly now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heterogeneous data random split for GNN #5749

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Heterogeneous data random split for GNN #5749

Uh oh!

1AngelUS Oct 17, 2022

Replies: 1 comment · 7 replies

Uh oh!

rusty1s Oct 17, 2022 Maintainer

Uh oh!

1AngelUS Oct 21, 2022 Author

Uh oh!

rusty1s Oct 22, 2022 Maintainer

Uh oh!

1AngelUS Oct 23, 2022 Author

Uh oh!

rusty1s Oct 24, 2022 Maintainer

Uh oh!

1AngelUS Oct 24, 2022 Author

1AngelUS
Oct 17, 2022

Replies: 1 comment 7 replies

rusty1s
Oct 17, 2022
Maintainer

1AngelUS Oct 21, 2022
Author

rusty1s Oct 22, 2022
Maintainer

1AngelUS Oct 23, 2022
Author

rusty1s Oct 24, 2022
Maintainer

1AngelUS Oct 24, 2022
Author