LinkNeighborLoader avoid/limiting sampling using the target edge #8710

GianlucaDeStefano · 2024-01-03T00:09:55Z

GianlucaDeStefano
Jan 3, 2024

Hi everyone.

I'm currently working on a link-prediction model applied to a heterogeneous graph. The focus is on a specific edge type, which I'll refer to as 'target-edge'.

The uniqueness of this edge type lies in its structure: while the graph contains thousands of unique source nodes, the destination nodes are just a few, furthermore, they only have one or two links (not considering the 'target edge').

The issue arises when loading these edges via the LinkNeighborLoader. Due to the highly connected nature of the destination nodes, the resulting sampled graph becomes excessively large. I'm aware that reducing the 'num_neighbors' parameter could address this, but it would also compromise the neighborhood structure by predominantly including nodes connected through the 'target edge'. This approach omits critical information from other edge types that I want to include in the sampled graph.

In short, my questions are:
Is there a way to exclude the 'target edge' while sampling the subgraph?
Alternatively, can other edge types be prioritized over the 'target edge' during the sampling process?
Any insights or suggestions on this matter would be greatly appreciated.

(PS: I have also considered removing the 'target edge' from the subgraph after sampling, the problem with this approach is that it would generate a huge graph difficult to process and with a lot of (dis)connected nodes that should not be included. )

Thank you in advance for your help!

rusty1s · 2024-01-05T14:45:37Z

rusty1s
Jan 5, 2024
Maintainer

Thanks for the issue. I am wondering why removing the target edge from sampling would yield a smaller subgraph? In the end, the subgraphs should be identical except for this one additional edge. Maybe I am misunderstanding.

If you don't want to include the target edge during sampling, you can

remove them after sampling (as you did)
remove them before sampling (not sure if possible in your case)

We also added support for weighted sampling (higher weights means higher chance for an edge to get sampled), but the weights are generally assumed to be static across the whole graph. Would that fit your use-case?

2 replies

GianlucaDeStefano Jan 6, 2024
Author

Thank you for your response.

I realize my initial explanation may have been unclear, so I'd like to clarify: when I refer to the 'target edge,' I'm talking about a category of edges, not just a single edge. These edges are what I aim to predict in my model.

The issue is that these 'target edges' are the primary reason for the destination nodes in our graph being highly connected. If we exclude or limit these edges in the subgraph sampling process, the resulting subgraph will be significantly smaller. This is because it's the 'target edges' that cause the destination nodes to have such high connectivity. (However, totally discarding them will also cause the loss of critical information).

In contrast, removing these 'target edges' only after sampling doesn't reduce the subgraph's node count; it just decreases the edge count. Unfortunately, this subgraph is still too large for my current system architecture to handle — we end up with too many nodes, and the graph won't fit into memory.

Ideally, I would love to use an 'hybrid approach'. This would involve sampling a maximum of N neighbors from each node, but with a preference for edges that are not 'target edges'. Such a method would allow me to include information from less common edges in the graph, which are currently being overlooked, while also maintaining a manageable subgraph size.

Do you see any easy way to achieve this 'hybrid approach'?

rusty1s Jan 7, 2024
Maintainer

I think weighted sampling is exactly what you could use here. In your case, "target edges" should have a lower weight than other edges in the graph.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LinkNeighborLoader avoid/limiting sampling using the target edge #8710

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

LinkNeighborLoader avoid/limiting sampling using the target edge #8710

Uh oh!

GianlucaDeStefano Jan 3, 2024

Replies: 1 comment · 2 replies

Uh oh!

rusty1s Jan 5, 2024 Maintainer

Uh oh!

Uh oh!

GianlucaDeStefano Jan 6, 2024 Author

Uh oh!

rusty1s Jan 7, 2024 Maintainer

GianlucaDeStefano
Jan 3, 2024

Replies: 1 comment 2 replies

rusty1s
Jan 5, 2024
Maintainer

GianlucaDeStefano Jan 6, 2024
Author

rusty1s Jan 7, 2024
Maintainer