Efficient inductive learning with `subgraph` and `NeighborLoader` for heterogeneous graphs #7269

jasperhyp · 2023-05-01T16:26:57Z

jasperhyp
May 1, 2023

Hi! I was thinking of using subgraph to remove edges attached to some nodes in a heterogeneous graph, and then applying NeighborLoader to perform graph sampling. The motivation comes from the need to do inductive learning w.r.t nodes on a KG (an uncommon practice), while the KG is too large to directly operate with. The node type that I am most interested in is A, but I also want message passing w.r.t other node types. Please refer to my other question #7262 for more context.

Without inductive learning and with only A nodes considered as centers, my sampling routine can be done as follows:

# NOTE: not all A entities are in our KG so there is an availability indicator
unique_kg_avail_bool = torch.zeros(kg_data.x_dict['A'].shape[0]).bool()
unique_kg_avail_bool[batch_As[batch_As < unique_kg_avail_bool.shape[0]]] = True
assert (batch_As < unique_kg_avail_bool.shape[0]).sum() == unique_kg_avail_bool.sum()
kg_loader = NeighborLoader(
    kg_data,
    # Sample `num_neighbors` for each node and each edge type for `num_layers` iterations
    num_neighbors=[kg_sampling_num_neighbors] * num_layers,
    batch_size=(batch_As < unique_kg_avail_bool.shape[0]).sum().item(),
    input_nodes=('A', unique_kg_avail_bool),
)
kg_batch_data = next(iter(kg_loader))
kg_A_index_map = kg_batch_data['A']['n_id']

Now, I want to (1) extend the input_nodes to include all node types and (2) ensure that we never sample some test A nodes in the graph. It seems not sufficient to only exclude them in input_nodes, since they can still be included in the multi-hop computation graph and contribute to message passing and weight updates.

For (1), I am not sure what to do since input_nodes only accepts one tuple input.

For (2), I can think of doing this in a way that we first remove all attached edges to test A nodes (while keeping all test A nodes still in the graph to avoid the need of multiple times of reindexing), and then perform the regular NeighborLoader. While it seems straightforward, I am also wondering if there is a more direct way to do this with subgraph (without reindexing). Thanks in advance!

Answered by rusty1s

May 2, 2023

(1) You are right that we can currently only sample from a single node type. The current workaround is to create multiple NeighborLoader instances to achieve this :(
(2) One way to ensure that is to leverage our temporal sampling strategy, where nodes in the test nodes (their timestamps should be higher). Alternatively, you can use subgraph with relabel_nodes=False if you want to avoid the re-indexing.

View full answer

rusty1s · 2023-05-02T11:39:52Z

rusty1s
May 2, 2023
Maintainer

(1) You are right that we can currently only sample from a single node type. The current workaround is to create multiple NeighborLoader instances to achieve this :(
(2) One way to ensure that is to leverage our temporal sampling strategy, where nodes in the test nodes (their timestamps should be higher). Alternatively, you can use subgraph with relabel_nodes=False if you want to avoid the re-indexing.

1 reply

jasperhyp May 2, 2023
Author

Thank you! Maybe it would be interesting to add support for dictionary input into input_nodes in the future. It seems to be a natural extension :) Since subgraph with the relabel_nodes option seems only exist in the utils module, I would also need to apply it to each edge list -- perhaps manually looping through the edge lists and removing everything is quicker haha.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Efficient inductive learning with `subgraph` and `NeighborLoader` for heterogeneous graphs #7269

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Efficient inductive learning with subgraph and NeighborLoader for heterogeneous graphs #7269

Uh oh!

Uh oh!

jasperhyp May 1, 2023

Replies: 1 comment · 1 reply

Uh oh!

rusty1s May 2, 2023 Maintainer

Uh oh!

jasperhyp May 2, 2023 Author

Efficient inductive learning with `subgraph` and `NeighborLoader` for heterogeneous graphs #7269

jasperhyp
May 1, 2023

Replies: 1 comment 1 reply

rusty1s
May 2, 2023
Maintainer

jasperhyp May 2, 2023
Author