Differences between true and false value for is_undirected parameter in RandomLinkSplit #8603

songsong0425 · 2023-12-12T11:41:04Z

songsong0425
Dec 12, 2023

Hi, always thank you for your effort in developing the package.
I have questions about the is_undirected parameter mechanism in RandomLinkSplit.

For example, when I have an edge list as in the first screenshot (i.e., positive label and index for the training/validation/test dataset), I get the individual result for the different is_undirected values.

# 1. `is_undirected=True`
transform = T.Compose([
    T.ToDevice(device),
    T.RandomLinkSplit(num_val=0.1, num_test=0.2, is_undirected=True, split_labels=True, neg_sampling_ratio=1.0)
])

train_data, val_data, test_data = transform(ex_graph)
train_data
# Data(edge_index=[2, 8], pos_edge_label=[4], pos_edge_label_index=[2, 4], neg_edge_label=[4], neg_edge_label_index=[2, 4])

print(train_data.pos_edge_label)
print(train_data.pos_edge_label_index)
# tensor([1., 1., 1., 1.], device='cuda:0')
# tensor([[ 2470,  2349,  3760, 32528],
#         [35913, 30487, 31291, 33495]], device='cuda:0')

print(train_data.neg_edge_label)
print(train_data.neg_edge_label_index)
# tensor([0., 0., 0., 0.], device='cuda:0')
# tensor([[18234,  1915, 25328, 17588],
#         [31252, 22404, 34279, 36116]], device='cuda:0')

# 2. is_undirected=False
transform = T.Compose([
    T.ToDevice(device),
    T.RandomLinkSplit(num_val=0.1, num_test=0.2, is_undirected=False, split_labels=True, neg_sampling_ratio=1.0)
])

train_data, val_data, test_data = transform(ex_graph)
train_data
# Data(edge_index=[2, 7], pos_edge_label=[7], pos_edge_label_index=[2, 7], neg_edge_label=[7], neg_edge_label_index=[2, 7])

print(train_data.pos_edge_label)
print(train_data.pos_edge_label_index)
# tensor([1., 1., 1., 1., 1., 1., 1.], device='cuda:0')
# tensor([[ 3760,  2349, 32528, 34187, 34214,  2470, 34214],
#         [31291, 30487, 33495, 30487, 33495, 35913,  9994]], device='cuda:0')

print(train_data.neg_edge_label)
print(train_data.neg_edge_label_index)
# tensor([0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
# tensor([[ 9502,   304,  3445,  1309, 15119, 32310, 11450],
#         [24549, 23110, 16417,   310, 35309,  2117, 29592]], device='cuda:0')

Q1.
Intuitively, I expected the result as the second case since I split the dataset into 7:1:2. But why does is_undirected=True return the smaller number of edges? I couldn't understand this phenomenon although I read the description of it here. Also, Is it okay to use the edge_label as the positive/negative labels after setting is_undirected=False?

Q2.
Also, I assumed that negative sampling would be formed between nodes in the original graph but negative cases in the above result showed odd numbers that did not appear in the screenshot. Where did nodes come from?

# Example result what I assumed:

print(train_data.pos_edge_label)
print(train_data.pos_edge_label_index)
# tensor([1., 1., 1., 1., 1., 1., 1.], device='cuda:0')
# tensor([[ 3760,  2349, 32528, 34187, 34214,  2470, 34214],
#         [31291, 30487, 33495, 30487, 33495, 35913,  9994]], device='cuda:0')

print(train_data.neg_edge_label)
print(train_data.neg_edge_label_index)
# tensor([0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
# tensor([[ 34133,   2349,  9994,  34187, 2470, 33495, 3760],
#         [31291, 2470, 36295,   2833, 34187,  2349, 2470]], device='cuda:0')

Q3.
For using the edge_label as the training label, should I make the edge list (i.e., screenshot) in the Data format as below?

pos_edge_index_init = torch.zeros((2, len(ex_dataset)), dtype=torch.long)

for i in range(len(ex_dataset)):
    pos_edge_index_init[0,i] = ex_dataset.iloc[i][1]
    pos_edge_index_init[1,i] = ex_dataset.iloc[i][2]
    
ex_graph = Data(edge_index=pos_edge_index_init)
ex_graph

If they are redundant questions, sorry for bothering you.
Thank you for reading!

Answered by rusty1s

Dec 21, 2023

Sorry for late reply:

Q1: With is_undirected=True, two things will happen:

edge_index in the output splits are guaranteed to be undirected as well.
edge_label_index only contains one direction of the link (instead of both directions). This explains why you get fewer labels back when setting is_undirected to True.

Q2: I don't understand what you mean by "formed between nodes in the original graph". Random links should be generated only between valid nodes in your graph.

Q3: Yes, it needs to be a COO tensor of the form [2, num_edges].

View full answer

rusty1s · 2023-12-21T08:01:39Z

rusty1s
Dec 21, 2023
Maintainer

Sorry for late reply:

Q1: With is_undirected=True, two things will happen:

edge_index in the output splits are guaranteed to be undirected as well.
edge_label_index only contains one direction of the link (instead of both directions). This explains why you get fewer labels back when setting is_undirected to True.

Q2: I don't understand what you mean by "formed between nodes in the original graph". Random links should be generated only between valid nodes in your graph.

Q3: Yes, it needs to be a COO tensor of the form [2, num_edges].

1 reply

songsong0425 Dec 22, 2023
Author

It's okay and thank you for your comment! About Q2, I'll recheck my code since some data would be used in previous practice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Differences between true and false value for is_undirected parameter in RandomLinkSplit #8603

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Differences between true and false value for is_undirected parameter in RandomLinkSplit #8603

Uh oh!

Uh oh!

songsong0425 Dec 12, 2023

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Dec 21, 2023 Maintainer

Uh oh!

songsong0425 Dec 22, 2023 Author

songsong0425
Dec 12, 2023

Replies: 1 comment 1 reply

rusty1s
Dec 21, 2023
Maintainer

songsong0425 Dec 22, 2023
Author