Heterogeneous node classification #5623

1AngelUS · 2022-10-07T12:41:58Z

1AngelUS
Oct 7, 2022

Hi,

I am working with a Hetero graph for node classification task. Its a very big dataset with ~13252659 nodes. Out of these 13205523 nodes are from class 0 and 4698 nodes are from class 1 (rest of the nodes are unlabeled), thus its a highly imbalanced graph.

I am using the code as here: https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/hetero_conv_dblp.py.
I am training the GNN in semi-supervised setting and using 4L nodes from class 0 and 4K nodes from class 1 to train the model.
I am testing my model on the entire dataset (except the unlabeled nodes).

To tackle the class imbalance problem, I have just changed the loss in the link above to the weight binary cross entropy loss.

Over the training epochs, the loss of model is going down monotonously and accuracy is also improving. But when I check the class wise stats, mainly confusion matrix (precision, recall, F1 score) on the test test at the end of the training process, it looked as below:

**Epoch: 040, Loss: 0.0692, Train: 0.9831, Test: 0.9801

                     precision    recall     f1-score   support

class 0 1.0000 0.9831 0.9915 13205523
class 1 0.0203 0.9853 0.0398 4698

accuracy 0.9831 13210221
macro avg 0.5102 0.9842 0.5156 13210221
weighted avg 0.9996 0.9831 0.9911 13210221**

Could you please guide me what could I possibly try to improve the model? I am stuck for many days but no luck :(

Thanks so much! :)

EdisonLeeeee · 2022-10-07T16:12:29Z

EdisonLeeeee
Oct 7, 2022
Collaborator

It looks like the model suffers seriously from the imbalanced class distribution. You can use imbalanced_sampler to address the class-imbalance problem. Also, it is better to use AUC instead of ACC as an evaluation metric for the imbalanced scenario.

2 replies

1AngelUS Oct 9, 2022
Author

Thanks a lot for your reply!

I had tried this as well but didn't improve the performance. :(
I will surely change the accuracy score to AUC metric.

Shall I change the model architecture or change the loss function?

EdisonLeeeee Oct 9, 2022
Collaborator

Your model architecture and the loss function look correct to me. A simple cross-entropy loss with proper sampling can alleviate the imbalance problem IMO. Alternatively, you can try other losses such as Balanced Cross-Entropy Loss and Focal Loss, see here.

1AngelUS · 2022-10-10T05:08:32Z

1AngelUS
Oct 10, 2022
Author

Thank you! I will try different losses as you suggested.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Heterogeneous node classification #5623

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Heterogeneous node classification #5623

Uh oh!

1AngelUS Oct 7, 2022

Replies: 2 comments · 2 replies

Uh oh!

EdisonLeeeee Oct 7, 2022 Collaborator

Uh oh!

1AngelUS Oct 9, 2022 Author

Uh oh!

EdisonLeeeee Oct 9, 2022 Collaborator

Uh oh!

1AngelUS Oct 10, 2022 Author

1AngelUS
Oct 7, 2022

Replies: 2 comments 2 replies

EdisonLeeeee
Oct 7, 2022
Collaborator

1AngelUS Oct 9, 2022
Author

EdisonLeeeee Oct 9, 2022
Collaborator

1AngelUS
Oct 10, 2022
Author