Bug in RandomLinkSplit #3267

dpaysan · 2021-10-01T09:00:30Z

dpaysan
Oct 1, 2021

Dear all,

I had some issues with some dimension mismatches and thereby saw that after determining the random split indices in the RandomLinkSplit finally the data is split not as I had expected. In the RandomLinkSplit class the data splits actually do not use the sampled validation and test edges but reuse the train edges for the validation data and a combination of the training and validation edges for the test data. I have copied the respective part of the code below which can be found here :

    train_edges = perm[:num_train]
    val_edges = perm[num_train:num_train + num_val]
    test_edges = perm[num_train + num_val:]
    train_val_edges = perm[:num_train + num_val]

    # Create data splits:
    train_data = self._split(data, train_edges)
    val_data = self._split(data, train_edges)
    test_data = self._split(data, train_val_edges)``

Could somebody explain why that is?

Thanks a lot in advance!

Answered by rusty1s

Oct 1, 2021

data.edge_index refers to the edges that are used for message passing. As such, during training and validation, you are allowed to propagate information based on the training edges, while during testing, you can propagate information based on the union of training and validation edges.

For evaluation, the data.edge_label_index holds a batch of positive and negative samples that should be used to evaluate your model on.

So:

data.edge_index should be solely used for message passing
data.edge_label_index should be used for evaluation

View full answer

rusty1s · 2021-10-01T09:04:10Z

rusty1s
Oct 1, 2021
Maintainer

data.edge_index refers to the edges that are used for message passing. As such, during training and validation, you are allowed to propagate information based on the training edges, while during testing, you can propagate information based on the union of training and validation edges.

For evaluation, the data.edge_label_index holds a batch of positive and negative samples that should be used to evaluate your model on.

So:

data.edge_index should be solely used for message passing
data.edge_label_index should be used for evaluation

1 reply

dpaysan Oct 1, 2021
Author

That makes sense. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in RandomLinkSplit #3267

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Bug in RandomLinkSplit #3267

Uh oh!

dpaysan Oct 1, 2021

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Oct 1, 2021 Maintainer

Uh oh!

dpaysan Oct 1, 2021 Author

dpaysan
Oct 1, 2021

Replies: 1 comment 1 reply

rusty1s
Oct 1, 2021
Maintainer

dpaysan Oct 1, 2021
Author