Skip to content
Discussion options

You must be logged in to vote

We concatenate implicitly, leading to a smaller memory footprint. In particular, we hold two versions of the a parameter vector, one for W@h_i (named att_r) and one for W@h_j (named att_l). We then multiply the source and destination node features with these parameters and sum the resulting parts together. This is equivalent to first concatenating source and destination node features, and multiply with a single attention parameter vector afterwards. Hope this is understandable :)

Replies: 1 comment 7 replies

Comment options

You must be logged in to vote
7 replies
@dhorka
Comment options

@rusty1s
Comment options

@dhorka
Comment options

@rusty1s
Comment options

@dhorka
Comment options

Answer selected by dhorka
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants