Handling a graph dataset with different number of nodes/edges #4830

chinmay5 · 2022-06-20T15:26:37Z

chinmay5
Jun 20, 2022

I have a dataset with a different number of edges, ranging from 0 (no graph structure) all the way to 200 edges. Similarly, the graph nodes also range from 0 to 100. It is a binary label prediction task and I have the distribution of labels as follows:-

less than 80 edges has the distribution: {1 -> 121 and 0 -> 74}
more than 80 edges has the distribution: {1 -> 182 and 0 -> 53}

I think this indicates that the dataset has an inherent bias toward predicting label 1 when the number of edges is larger.

When I look at the situation with the labels would be:-
less than 40 noes : {1: ->182, 0 -> 53}
more than 40 noes {1 -> 60, 0-> 6}
Thus, a similar trend with the number of nodes.

As a result, I can observe that operations such as global_mean_pool and global_sum_pool for the readout work better than the global_max_pool. Each node represents an image and I use a pre-trained CNN to get the features.

My question is, what would be the preferred way to ensure that my model is not getting a bias based on the size of nodes and number of edges.

rusty1s · 2022-06-21T05:33:42Z

rusty1s
Jun 21, 2022
Maintainer

You can avoid such a bias by not adding graph structural features to the nodes (such as input degree), and do not make use of sum aggregation. For example, global_mean_pool(...) should work just fine since the model cannot really construct the number of nodes from it.

2 replies

chinmay5 Jun 21, 2022
Author

As of now, I have performed the k fold stratification by dividing the graphs into a group of 3. These are small, medium and large graphs. I think this is somewhat equivalent to the balanced sampler idea that we often use for imaging data. Do you think this makes sense?

rusty1s Jun 22, 2022
Maintainer

Sure, if your goal is to find out whether your model is agnostic to graph sizes, this makes a lot of sense :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling a graph dataset with different number of nodes/edges #4830

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Handling a graph dataset with different number of nodes/edges #4830

Uh oh!

chinmay5 Jun 20, 2022

Replies: 1 comment · 2 replies

Uh oh!

Uh oh!

rusty1s Jun 21, 2022 Maintainer

Uh oh!

chinmay5 Jun 21, 2022 Author

Uh oh!

Uh oh!

rusty1s Jun 22, 2022 Maintainer

chinmay5
Jun 20, 2022

Replies: 1 comment 2 replies

rusty1s
Jun 21, 2022
Maintainer

chinmay5 Jun 21, 2022
Author

rusty1s Jun 22, 2022
Maintainer