Heterogeneous Graph Node Classification #5426

1AngelUS · 2022-09-13T12:31:50Z

1AngelUS
Sep 13, 2022

Hi,

I am working on Heterogeneous Node Classification task. The HeteroData looks as below:
HeteroData(
c={
x=[55590, 47],
y=[55590],
train_mask=[55590],
val_mask=[55590],
test_mask=[55590]
},
m={ x=[40754, 2] },
(c, uses, m)={ edge_index=[2, 625074] }
)

I want to normalize the each dimension of the feature vector (The features of node types 'c' and 'm' are different).
I came across NormalizeFeatures function which Row-normalizes the attributes. Could you please help me if there is any function which can do the column-wise scaling, since the node features are of varying scale and I am not sure if NormalizeFeatures makes sense in this case.

Also, my second doubt is regarding the performance of GNN.
During training the model, the loss is very high initially:
Epoch: 001, Loss: 4277.0679, Train: 0.9471, Test: 0.9423
Epoch: 002, Loss: 10906.3450, Train: 0.9610, Test: 0.9476

And there is not much improvement in the accuracy as well. That is, for over 100 epochs the accuracy on test set still remains between 94 - 95%.

Could you please guide me regarding these queries. Thanks so much for your help!

rusty1s · 2022-09-14T14:14:18Z

rusty1s
Sep 14, 2022
Maintainer

You should be able to apply normalization on your own via

for node_type in data.node_types:
    x = data[node_type].x
    x = (x - x.mean(dim=0, keepdim=True)) / x.std(dim=0, keepdim=True)
    data[node_type].x = x

If the loss is very high, that usually means that there is some bug somewhere. Are you using cross entropy loss?

0 replies

1AngelUS · 2022-09-15T10:07:51Z

1AngelUS
Sep 15, 2022
Author

Thanks so much for clarifying normalization!

Also please find below the code to train the GNN:

def train ():

      model.train()
      total_examples = total_loss = 0
      for batch in tqdm(train_loader):

          optimizer.zero_grad()
          batch = batch.to(device, 'edge_index')
          batch_size = batch['paper'].batch_size
          out = model(batch.x_dict, batch.edge_index_dict)['paper'][:batch_size]

          loss = F.cross_entropy(out, batch['paper'].y[:batch_size])

          loss.backward()
          optimizer.step()
  
          total_examples += batch_size
          total_loss += float(loss) * batch_size
  
      return total_loss / total_examples

I followed the code given in https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/to_hetero_mag.py.
Could you please guide me regarding this?

Thanks!

4 replies

rusty1s Sep 15, 2022
Maintainer

This looks all good, but if your loss is greater than -log(1/n_classes) than there is something heavily wrong going on, see here. Do you apply a final non-linearity as part of your loss (you should avoid this)?

1AngelUS Sep 15, 2022
Author

Thanks for sharing the link. I will go through it and will figure out where I am going wrong as the loss is much higher than -log(1/n_classes).

Could you please help me understand what is the purpose of NormalizeFeatures function which Row-normalizes the attributes. Given that all the features might be of different distribution, how does it make sense to apply this function on the data?

Sorry to bug you with so many doubts. You are super helpful and I am highly grateful to you for your kind help! :)

rusty1s Sep 15, 2022
Maintainer

NormalizeFeatures row-wise normalizes all features. In most cases, it is better to use standard normalization though:

data.x = (data.x - data.x.mean(dim=0, keepdim=True)) / data.x.std(dim=0, keepdim=True)

In a heterogeneous graph scenario, you would need to normalize the features across different node types individually.

1AngelUS Sep 15, 2022
Author

Thank you @rusty1s :)

The link you shared earlier was very helpful. The loss came down after I normalized the features. Thanks so much once again!

1AngelUS · 2022-09-16T07:01:06Z

1AngelUS
Sep 16, 2022
Author

Hi,

Sorry one more doubt.

Actually my graph is a bipartite graph (2 node types) and its a big graph with 10M nodes of each type and 1B edges. Therefore, I need to do Neighbor sampling to be able to fit and train GNN on my GPU.
Since, to my understanding, NeighborLoader function does not support the bipartite graphs yet so, I treated my undirected bipartite graph as hetero graph and leveraged the NeighborLoader function and trained the GraphSAGE model using SAGEConv.

I followed the approach given in https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/to_hetero_mag.py (just treated my undirected bipartite graph as hetero graph).

I request you to please guide me if there is something wrong with this approach.

Thank you!

2 replies

rusty1s Sep 16, 2022
Maintainer

Yes, this is the recommended approach :)

1AngelUS Sep 16, 2022
Author

Thanks @rusty1s :)

1AngelUS · 2022-09-28T12:59:53Z

1AngelUS
Sep 28, 2022
Author

Hi,

Sorry for pulling this thread again.

I wanted your guidance regarding one more query. I want to do inductive learning on the Hetero graph (train and test on different snapshots). I am using the same approach as in this link https://github.com/pyg-team/pytorch_geometric/blob/master/examples/hetero/to_hetero_mag.py. They are training SAGE model which works in inductive setting as well. Please guide me this is correct?

Thanks so much!

2 replies

rusty1s Sep 29, 2022
Maintainer

GraphSAGE and GNNs in general can be naturally applied on unseen graphs. You can simply use your trained model and apply it on a different snapshot.

1AngelUS Sep 30, 2022
Author

Thank you! Really appreciate your response!

Heterogeneous Graph Node Classification #5426

Uh oh!

Uh oh!

1AngelUS Sep 13, 2022

Replies: 4 comments · 8 replies

Uh oh!

rusty1s Sep 14, 2022 Maintainer

Uh oh!

Uh oh!

1AngelUS Sep 15, 2022 Author

Uh oh!

Uh oh!

rusty1s Sep 15, 2022 Maintainer

Uh oh!

1AngelUS Sep 15, 2022 Author

Uh oh!

rusty1s Sep 15, 2022 Maintainer

Uh oh!

Uh oh!

1AngelUS Sep 15, 2022 Author

Uh oh!

1AngelUS Sep 16, 2022 Author

Uh oh!

rusty1s Sep 16, 2022 Maintainer

Uh oh!

1AngelUS Sep 16, 2022 Author

Uh oh!

1AngelUS Sep 28, 2022 Author

Uh oh!

rusty1s Sep 29, 2022 Maintainer

Uh oh!

1AngelUS Sep 30, 2022 Author

1AngelUS
Sep 13, 2022

Replies: 4 comments 8 replies

rusty1s
Sep 14, 2022
Maintainer

1AngelUS
Sep 15, 2022
Author

rusty1s Sep 15, 2022
Maintainer

1AngelUS Sep 15, 2022
Author

rusty1s Sep 15, 2022
Maintainer

1AngelUS Sep 15, 2022
Author

1AngelUS
Sep 16, 2022
Author

rusty1s Sep 16, 2022
Maintainer

1AngelUS Sep 16, 2022
Author

1AngelUS
Sep 28, 2022
Author

rusty1s Sep 29, 2022
Maintainer

1AngelUS Sep 30, 2022
Author