Replies: 2 comments 2 replies
-
This also aligns with my impression that GNNs on imbalanced labels do perform poorly, although I feel this is a problem with NNs in general. As an alternative to loss re-weighting, you can also over- or under-sample the respective labels, although this requires you to apply GNNs in mini-batch mode, e.g., via Sadly, I do not have any better advice for you at this point in time. |
Beta Was this translation helpful? Give feedback.
-
This is an old topic but I recently had this exact problem and I think I found a solution and I hope it will be able to help someone. So the function torch_geometric.transform.RandomNodeSplit() have arguments that can allow you to choose the number of samples for each class (https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.transforms.RandomNodeSplit.html?highlight=RandomNodeSplit#torch_geometric.transforms.RandomNodeSplit). I don't know when this function was updated, so maybe these parameters didn't exist when this question was first asked. With split = 'random' or split = 'train_rest', you can further use the num_train_per_class parameter. Obviously it's not perfect because if you have a really unbalanced dataset, you end using only a small fraction of your dataset for training.
With s, t and v such as: s * c (number of classes) + v + t = total_number_of_nodes Hope this can help someone. But if someone else managed to do it in a way more effective way with a specific dataloader, I'll be interested to know how. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
When using very imbalanced data, my experience is that GNN methods like GraphSAGE and GCN perform poorly, although I am using class ratio to weight the loss function accordingly, but still the classifier only predicts the majority class. Is there any feature or method other than loss weighting that can be used here?
For better context, my problem is a binary classification where the class ratio is 400:1. Also I am using the ROC AUC metric on the validation set to determine the best number of epochs to train. I have also tried other metrics such as PR AUC and f1-score.
Beta Was this translation helpful? Give feedback.
All reactions