Doubt about attention coefficient of gat #2918

dhorka · 2021-07-29T15:31:09Z

dhorka
Jul 29, 2021

Hi,

After checking the current implementation of GAT I have not been able to find the concatenation of [ Whi || Whj] described in the paper The original formulation says that:

What I saw is instead of doing a concatenation, the code is doing a summation in the message function alpha = alpha_j if alpha_i is None else alpha_j + alpha_i. Is there something that I am missing? I am asking that because I would like to change this concatenation for a dot product, but I was not able to find this concatenation step.

Thanks,

Answered by rusty1s

Jul 29, 2021

We concatenate implicitly, leading to a smaller memory footprint. In particular, we hold two versions of the a parameter vector, one for W@h_i (named att_r) and one for W@h_j (named att_l). We then multiply the source and destination node features with these parameters and sum the resulting parts together. This is equivalent to first concatenating source and destination node features, and multiply with a single attention parameter vector afterwards. Hope this is understandable :)

View full answer

rusty1s · 2021-07-29T17:08:56Z

rusty1s
Jul 29, 2021
Maintainer

We concatenate implicitly, leading to a smaller memory footprint. In particular, we hold two versions of the a parameter vector, one for W@h_i (named att_r) and one for W@h_j (named att_l). We then multiply the source and destination node features with these parameters and sum the resulting parts together. This is equivalent to first concatenating source and destination node features, and multiply with a single attention parameter vector afterwards. Hope this is understandable :)

7 replies

dhorka Aug 2, 2021
Author

Hi,

after checking the code that you have pointed I have not been able to find where the dot product is implemented. What I am trying to do is modify the function a(concat(Wx_i, Wx_j)) by dot(Wx_i, Wx_j). This modification is in the code that you have sent me? Do you think that you can give me some hint to how to implement this efficiently with the current GAT implementation?

Thanks!

rusty1s Aug 5, 2021
Maintainer

Yes, this is in the code, i.e., line 265-268, in which attention coefficients are implemented via (x_i * x_j).sum(dim=-1). You can do the same in GATConv, where you simply replace the alpha computation in message via alpha = (x_i * x_j).sum(dim=-1) / math.sqrt(self.out_channels).

dhorka Aug 5, 2021
Author

I didn't realize that was the dot product. Sorry, but there is something that I do not understand why are you dividing the product for the sqrt of the number of channels?

rusty1s Aug 5, 2021
Maintainer

This actually refers to the scaled dot-product attention, as introduced in the "Attention is All You Need" paper.

dhorka Aug 5, 2021
Author

Thanks! All my doubts have been clarified!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Doubt about attention coefficient of gat #2918

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Doubt about attention coefficient of gat #2918

Uh oh!

dhorka Jul 29, 2021

Replies: 1 comment · 7 replies

Uh oh!

rusty1s Jul 29, 2021 Maintainer

Uh oh!

Uh oh!

dhorka Aug 2, 2021 Author

Uh oh!

rusty1s Aug 5, 2021 Maintainer

Uh oh!

dhorka Aug 5, 2021 Author

Uh oh!

rusty1s Aug 5, 2021 Maintainer

Uh oh!

dhorka Aug 5, 2021 Author

dhorka
Jul 29, 2021

Replies: 1 comment 7 replies

rusty1s
Jul 29, 2021
Maintainer

dhorka Aug 2, 2021
Author

rusty1s Aug 5, 2021
Maintainer

dhorka Aug 5, 2021
Author

rusty1s Aug 5, 2021
Maintainer

dhorka Aug 5, 2021
Author