Dataset for Multi level Graph representation #6324
-
Hello, I am developing an architecture with a particular 'data representation' and I am unsure what is the best and most efficient way to store the data before training. What I am trying to do is this: My idea is to use what essentially are 2 distinct networks one for extracting the embeddings of the nodes from their associated sub-graphs and a second to actually perform link prediction (the two networks would be trained end to end). However, I have a doubt. Until now, I have managed my data creating custom Dataset classes that internally create and store my graphs as Data objects. However, in this specific case, I am unsure of how to handle this 'multi-level' graph representation as Data objects require storing the node's features as fixed-size tensors and as a consequence, I don't know how I can store each node's subgraph (with relative features and everything). I want to point out that I am pretty new to graph neural networks so the solution may very well be under my nose but at the moment I can't see what's the best way to handle this use case. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Mh, tricky problem. Can the subgraph structures already be used for link prediction, or do you just want to use them as initial feature representations and then use a second GNN for link prediction? What I imagine you could to is to have two datasets: one that models your initial graph, and then a second dataset which can be used to index the subgraph structures for each sampled node in the initial graph. That would look something like: data.n_id = torch.arange(data.num_nodes)
loader = LinkNeighborLoader(data, ...)
for batch in loader:
# Get the subgraphs for each sampled node:
data_list = subgraph_dataset[batch.n_id]
subgraph_batch = Batch.from_data_list(data_list)
x = model(subgraph_batch) # An embedding per sampled node on the subgraph. |
Beta Was this translation helpful? Give feedback.
Mh, tricky problem. Can the subgraph structures already be used for link prediction, or do you just want to use them as initial feature representations and then use a second GNN for link prediction?
What I imagine you could to is to have two datasets: one that models your initial graph, and then a second dataset which can be used to index the subgraph structures for each sampled node in the initial graph. That would look something like: