Understanding how edge_index and edge_label_index relate to message passing #6923
-
Hi Everyone, I'm working on a link prediction project using GNNs. So far I have achieved some promising results, but I wanted to understand the relationship between the edge indices and the message passing process to have more confidence that my results are legitimate. My understanding is that messages are passed between edges held in the "edge_index" attribute, but not between edges in the "edge_label_index" attribute. I believe the "edge_label_index" edges are only used for supervision and/or for assessment. Could anyone confirm if this is true or correct me if I'm wrong? Thanks! |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 4 replies
-
Yep you are right. |
Beta Was this translation helpful? Give feedback.
-
Got it thank you! I implemented my own split to ensure that edge_label_index and edge_index are completely disjoint in the testing and validation data splits. Just wanted to confirm my understanding to rule out data leakage issues. |
Beta Was this translation helpful? Give feedback.
-
I'm a bit confused still about edge_label_index and edge_index. I followed the setup of creating a training & validation set plus loaders for each using the Link Prediction on MovieLens tutorial. I'm now trying to look at my validation results and assess the incorrect predictions. I've modified the very last code block from the tutorial to include a list where I keep track of the edges assessed during the validation so that later I can look at the original data using the indices:
Specifically I added:
which I would have thought would give me a list of the node indices that make up each edge assessed, but when I look at the indices in my lists For example if I take Is there some transformation I need to do in order to get the original indices? |
Beta Was this translation helpful? Give feedback.
-
Hi. I'm facing a similar issue understanding how these work. So I'm using a different dataset but my edges for user 0 are movies 0 till 50. So when I print the edge_index of data, I see tuples as (0, 0) ... (0, 50). Once I do RandomLinkSplit however, I see new edges that weren't in the original graph. To give an example, I printed edge_index and edge_label index of train/test/val data where I'm printing out the movies for user 0 edges:
And this is how I call RandomLinkSplit
It all makes sense up till the val_data edge_label_index. I don't understand where the values 2799, 24912, 26827, 2900, 19263, 7033, 28625, 33115, 11532, 16189, 18671 come from. These edges don't exist in the original data and I'm also not doing negative sampling. Same goes for test_label_index with 711, 19846. Alongside this, I also wanted to ask whether it's possible to split the links in RandomLinkSplit per user based where the test and val ratios are per user links instead of all the links in the graph. Thanks a lot |
Beta Was this translation helpful? Give feedback.
Yep you are right.
But one thing to add is that some edges in
edge_label_index
might be inedge_index
, as in some edges might be used for supervision and message passing.