Dataset for Multi level Graph representation #6324

GianlucaDeStefano · 2023-01-01T20:16:22Z

GianlucaDeStefano
Jan 1, 2023

Hello, I am developing an architecture with a particular 'data representation' and I am unsure what is the best and most efficient way to store the data before training.

What I am trying to do is this:
I want to create a link prediction model between nodes of the same type. In particular, each of these nodes would not have an associated feature vector but a child subgraph from which a suitable embedding should be extracted.

My idea is to use what essentially are 2 distinct networks one for extracting the embeddings of the nodes from their associated sub-graphs and a second to actually perform link prediction (the two networks would be trained end to end). However, I have a doubt.
My dataset is huge and I have to find an efficient way to represent it in order to load it efficiently during the training phase.

Until now, I have managed my data creating custom Dataset classes that internally create and store my graphs as Data objects.
At training time I then load the data using methods such as the NeighborLoader class.

However, in this specific case, I am unsure of how to handle this 'multi-level' graph representation as Data objects require storing the node's features as fixed-size tensors and as a consequence, I don't know how I can store each node's subgraph (with relative features and everything).

I want to point out that I am pretty new to graph neural networks so the solution may very well be under my nose but at the moment I can't see what's the best way to handle this use case.
Do you have any tips on how I could implement this?

Answered by rusty1s

Jan 2, 2023

Mh, tricky problem. Can the subgraph structures already be used for link prediction, or do you just want to use them as initial feature representations and then use a second GNN for link prediction?

What I imagine you could to is to have two datasets: one that models your initial graph, and then a second dataset which can be used to index the subgraph structures for each sampled node in the initial graph. That would look something like:

data.n_id = torch.arange(data.num_nodes)
loader = LinkNeighborLoader(data, ...)
for batch in loader:
    # Get the subgraphs for each sampled node:
    data_list = subgraph_dataset[batch.n_id]
    subgraph_batch = Batch.from_data_list(data_list)
    x = model

View full answer

rusty1s · 2023-01-02T13:03:25Z

rusty1s
Jan 2, 2023
Maintainer

Mh, tricky problem. Can the subgraph structures already be used for link prediction, or do you just want to use them as initial feature representations and then use a second GNN for link prediction?

What I imagine you could to is to have two datasets: one that models your initial graph, and then a second dataset which can be used to index the subgraph structures for each sampled node in the initial graph. That would look something like:

data.n_id = torch.arange(data.num_nodes)
loader = LinkNeighborLoader(data, ...)
for batch in loader:
    # Get the subgraphs for each sampled node:
    data_list = subgraph_dataset[batch.n_id]
    subgraph_batch = Batch.from_data_list(data_list)
    x = model(subgraph_batch)  # An embedding per sampled node on the subgraph.

1 reply

GianlucaDeStefano Jan 2, 2023
Author

The link prediction task is concerned only with connecting 'First level nodes', therefore the subgraphs (second-level nodes and edges) would be used just to extract the embedding of their respective first-level parent node. There isn't any possibility of subgraphs sharing nodes or edges so they will be totally disconnected.

For this reason, I am not against using one single big gnn that takes care of extracting the embedding and then doing link prediction.
This one was just the most comprehensible schema of the architecture that I could find.
Still, I think I will code it in this way as sampling first-level-edges and subgraphs will be easier with this formulation and I don't see any drawback.

I see what you mean with the 2 datasets-method as I was thinking about something on the same line, I will keep going in this direction then.

Thanks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dataset for Multi level Graph representation #6324

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Dataset for Multi level Graph representation #6324

Uh oh!

GianlucaDeStefano Jan 1, 2023

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Jan 2, 2023 Maintainer

Uh oh!

Uh oh!

GianlucaDeStefano Jan 2, 2023 Author

GianlucaDeStefano
Jan 1, 2023

Replies: 1 comment 1 reply

rusty1s
Jan 2, 2023
Maintainer

GianlucaDeStefano Jan 2, 2023
Author