Node feature embedding are identical. #3747

lingchen1991 · 2021-12-22T05:56:08Z

lingchen1991
Dec 22, 2021

I used GATConv and SAGPooling to classify a undirected complete graph with 48 nodes. (torch-geometic version 1.6.1)

But I found that the output of GATConv(512, 3) shows that all 48 node feature embedding are the same.

tensor([[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237],
[-43.2148, -0.8637, -0.2237]], device='cuda:0',
grad_fn=)

Then I looked into the source code of GATConv class, and found that the following line matters.

" out = self.propagate(edge_index, x=(x_l, x_r), alpha=(alpha_l, alpha_r), size=size) "

The inputs including 'x_l', 'x_r', 'alpha_l', 'alpha_r' shows different values in terms of node. However, the 'out' shows the identical 3D features in terms of node. Just like the above tensor.

Furthermore, I looked at the self.propagate function in the MessagePassing class. But the argument is no matched with the one used in GATConv class.

I also looked into the value of 'alpha' in the message function in GATConv class. The alpha are the totally the same with the value 1 / 48 = 0.0208.

Thus, I would like to ask you that do you have any clues on this situation?
And how the self.propagate function works? How come that the dim of alpha_l/r is 48 while the dim of alpha_i/j is 48*48?

BTW, in the original paper, there is a weight vector a for calculating the alphas by the two concatenated node embedding. Hence, the dim of weight vector a should be two times of out_channels. However, I cannot see any learnable parameter with this dim.

The output attention weight alpha in above equation as shown below, which leads to the identical node feature embedding.

rusty1s · 2021-12-22T09:35:07Z

rusty1s
Dec 22, 2021
Maintainer

This is hard to tell without knowing your input feature matrix and edge_index. Obviously, this shouldn't be the case. IMO, this can only happen if the features for all nodes are the same (this also leads to uniform distribution of attention coefficients). Can you confirm? You can also try to upload the input feature matrix and edge_index for me to reproduce this issue.

PyG splits the weight vector a into two parts, such that we do not need to explicitly concatenate Wh_i and Wh_j (which saves some memory), e.g., the following formulation is equivalent:

alpha = self.att_src @ Wh_i + self.att_dst @ Wh_j

6 replies

rusty1s Dec 23, 2021
Maintainer

Thank you. Are attention coefficients also the same when actually training the model?

I can confirm that output embeddings are the same for an untrained GAT network. While the attention scores are different before the softmax, they indeed get squashed to ~1/48 after the softmax (which is, however, to be expected for an untrained neural network). One thing that may help is to increase the negative_slope of GATConv (e.g., to 0.5 or 1.0).

lingchen1991 Dec 23, 2021
Author

This is the alpha matrix [48, 48], which was saved after several iterations, of a sample.

This is the matrix for another sample, which saved after several epochs.

The alpha matrix with all 0.208, which was shown in my original post, are the saved by output of SAGPooling. I also used GATConv to SAGPooling. Since the input of the SAGPooling are the graph whose node embedding feature are the same, all elements of the output alpha matrix are the same as 0.0208.

I also try to set the negative_slope to 1.0. But the issue still exists.

rusty1s Dec 23, 2021
Maintainer

The main questions to me are: Does the model train? Does the loss decrease at all? If the model does not train, then it is to be expected that attention coefficients do not produce meaningful output. Can you shed some light on this? :)

lingchen1991 Dec 23, 2021
Author

The model did train and the loss did decrease. But I cannot understand why the all 48 node embedding features have to converge to the same.

rusty1s Dec 24, 2021
Maintainer

Thanks for confirming. This is interesting. Even after 100 epochs, attention coefficients are nearly the same? What happens if you apply some standardization on the node input features to ensure that they are nicely distributed:

data.x = data.x - data.x.mean(dim=0, keepdim=True) / data.x.std(dim=0, keepdim=True)

Node feature embedding are identical. #3747

Uh oh!

Uh oh!

lingchen1991 Dec 22, 2021

Replies: 1 comment · 6 replies

Uh oh!

Uh oh!

rusty1s Dec 22, 2021 Maintainer

Uh oh!

Uh oh!

rusty1s Dec 23, 2021 Maintainer

Uh oh!

lingchen1991 Dec 23, 2021 Author

Uh oh!

rusty1s Dec 23, 2021 Maintainer

Uh oh!

Uh oh!

lingchen1991 Dec 23, 2021 Author

Uh oh!

rusty1s Dec 24, 2021 Maintainer

lingchen1991
Dec 22, 2021

Replies: 1 comment 6 replies

rusty1s
Dec 22, 2021
Maintainer

rusty1s Dec 23, 2021
Maintainer

lingchen1991 Dec 23, 2021
Author

rusty1s Dec 23, 2021
Maintainer

lingchen1991 Dec 23, 2021
Author

rusty1s Dec 24, 2021
Maintainer