High AUC for link prediction at initialization depending on num_neighbors
value of LinkNeighborLoader
#8782
Replies: 2 comments
-
Really interesting, let me try to come up with an explanation :) Your graph seems to be very dense, given rise to the assumption that an untrained GCN yields approximately equal embeddings for every node in your graph (thus explains the 0.5 AUC). However, if neighbor sampling is used, node features get much more discriminative, and with the inductive bias of the untrained GCN, it is already able to find similar pairs of nodes. |
Beta Was this translation helpful? Give feedback.
-
Hi @rusty1s |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm having very strange behaviour on a link prediction problem I would really appreciate any guidance on.
The input graph is undirected with 4367 nodes with 392027 edges:
I then perform splitting using
RandomLinkSplit
:I'm going to create two validation data loaders with differing values for
num_neighbors
to illustrate the following behaviour, one with[-1]
and one with[-1,20,5]
:I'm using the validation loader for an example here but the behaviour described below occurs for train and test splits also.
I can then draw a single sample from each:
I then instantiate a model I'll call
sl
, this is essentially atorch_geometric.nn.GAE
with twoGCNConv
hidden layers for the encoder and Relus (though I don't think the precise architecture matters). We can then code a basic forward function (again, no training whatsoever):Calling this on
v1
andv2
gives:In other words, with
num_neighbors = [-1]
we get the expected AUC (~0.5), while withnum_neighbors = [-1,20,5]
we get significantly better AUC than we'd expect at random, with losses that reflect this.I know
num_neighbours
controls how links are sampled, but I can't see how it would give this behaviour. Again, there's absolutely no training here so it's not an overfitting issue, andGCNConv
normalizes by the degree matrix so I can't see how it would be able to predict links in already dense parts of the graph.Any insights would be hugely appreciated, thanks.
Beta Was this translation helpful? Give feedback.
All reactions