Replies: 1 comment 2 replies
-
You can avoid such a bias by not adding graph structural features to the nodes (such as input degree), and do not make use of |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I have a dataset with a different number of edges, ranging from 0 (no graph structure) all the way to 200 edges. Similarly, the graph nodes also range from 0 to 100. It is a binary label prediction task and I have the distribution of labels as follows:-
less than 80 edges has the distribution: {1 -> 121 and 0 -> 74}
more than 80 edges has the distribution: {1 -> 182 and 0 -> 53}
I think this indicates that the dataset has an inherent bias toward predicting label 1 when the number of edges is larger.
When I look at the situation with the labels would be:-
less than 40 noes : {1: ->182, 0 -> 53}
more than 40 noes {1 -> 60, 0-> 6}
Thus, a similar trend with the number of nodes.
As a result, I can observe that operations such as global_mean_pool and global_sum_pool for the readout work better than the global_max_pool. Each node represents an image and I use a pre-trained CNN to get the features.
My question is, what would be the preferred way to ensure that my model is not getting a bias based on the size of nodes and number of edges.
Beta Was this translation helpful? Give feedback.
All reactions