Differences between true and false value for is_undirected parameter in RandomLinkSplit #8603
-
Hi, always thank you for your effort in developing the package. For example, when I have an edge list as in the first screenshot (i.e., positive label and index for the training/validation/test dataset), I get the individual result for the different is_undirected values. # 1. `is_undirected=True`
transform = T.Compose([
T.ToDevice(device),
T.RandomLinkSplit(num_val=0.1, num_test=0.2, is_undirected=True, split_labels=True, neg_sampling_ratio=1.0)
])
train_data, val_data, test_data = transform(ex_graph)
train_data
# Data(edge_index=[2, 8], pos_edge_label=[4], pos_edge_label_index=[2, 4], neg_edge_label=[4], neg_edge_label_index=[2, 4])
print(train_data.pos_edge_label)
print(train_data.pos_edge_label_index)
# tensor([1., 1., 1., 1.], device='cuda:0')
# tensor([[ 2470, 2349, 3760, 32528],
# [35913, 30487, 31291, 33495]], device='cuda:0')
print(train_data.neg_edge_label)
print(train_data.neg_edge_label_index)
# tensor([0., 0., 0., 0.], device='cuda:0')
# tensor([[18234, 1915, 25328, 17588],
# [31252, 22404, 34279, 36116]], device='cuda:0') # 2. is_undirected=False
transform = T.Compose([
T.ToDevice(device),
T.RandomLinkSplit(num_val=0.1, num_test=0.2, is_undirected=False, split_labels=True, neg_sampling_ratio=1.0)
])
train_data, val_data, test_data = transform(ex_graph)
train_data
# Data(edge_index=[2, 7], pos_edge_label=[7], pos_edge_label_index=[2, 7], neg_edge_label=[7], neg_edge_label_index=[2, 7])
print(train_data.pos_edge_label)
print(train_data.pos_edge_label_index)
# tensor([1., 1., 1., 1., 1., 1., 1.], device='cuda:0')
# tensor([[ 3760, 2349, 32528, 34187, 34214, 2470, 34214],
# [31291, 30487, 33495, 30487, 33495, 35913, 9994]], device='cuda:0')
print(train_data.neg_edge_label)
print(train_data.neg_edge_label_index)
# tensor([0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
# tensor([[ 9502, 304, 3445, 1309, 15119, 32310, 11450],
# [24549, 23110, 16417, 310, 35309, 2117, 29592]], device='cuda:0') Q1. Q2. # Example result what I assumed:
print(train_data.pos_edge_label)
print(train_data.pos_edge_label_index)
# tensor([1., 1., 1., 1., 1., 1., 1.], device='cuda:0')
# tensor([[ 3760, 2349, 32528, 34187, 34214, 2470, 34214],
# [31291, 30487, 33495, 30487, 33495, 35913, 9994]], device='cuda:0')
print(train_data.neg_edge_label)
print(train_data.neg_edge_label_index)
# tensor([0., 0., 0., 0., 0., 0., 0.], device='cuda:0')
# tensor([[ 34133, 2349, 9994, 34187, 2470, 33495, 3760],
# [31291, 2470, 36295, 2833, 34187, 2349, 2470]], device='cuda:0') Q3. pos_edge_index_init = torch.zeros((2, len(ex_dataset)), dtype=torch.long)
for i in range(len(ex_dataset)):
pos_edge_index_init[0,i] = ex_dataset.iloc[i][1]
pos_edge_index_init[1,i] = ex_dataset.iloc[i][2]
ex_graph = Data(edge_index=pos_edge_index_init)
ex_graph If they are redundant questions, sorry for bothering you. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Sorry for late reply: Q1: With
Q2: I don't understand what you mean by "formed between nodes in the original graph". Random links should be generated only between valid nodes in your graph. Q3: Yes, it needs to be a COO tensor of the form |
Beta Was this translation helpful? Give feedback.
Sorry for late reply:
Q1: With
is_undirected=True
, two things will happen:edge_index
in the output splits are guaranteed to be undirected as well.edge_label_index
only contains one direction of the link (instead of both directions). This explains why you get fewer labels back when settingis_undirected
toTrue
.Q2: I don't understand what you mean by "formed between nodes in the original graph". Random links should be generated only between valid nodes in your graph.
Q3: Yes, it needs to be a COO tensor of the form
[2, num_edges]
.