How to handle multiple StaticHeteroGraphs? #7206

MicheleSodano · 2023-04-19T02:39:20Z

MicheleSodano
Apr 19, 2023

I have constructed several StaticHeteroGraphs using Networkx, which I then converted into HeteroData objects using StaticHeteroGraphTemporalSignal from torch_geometric_temporal. Each HeteroData represents a snapshot of the StaticHeteroGraph at each time-step. In each StaticHeteroGraph, nodes correspond to specific components of a building (such as Walls, Windows, Floors, and Rooms), each with a different number of features. While the number of nodes can vary between buildings, the node types remain constant. To train all the StaticHeteroGraphs together, I have created "dummy nodes" with zero-padding features (of the size of each node type) to account for missing nodes. Essentially, all StaticHeteroGraphs are subgraphs of the same "big Graph" with all possible nodes.

The edge_index_dict may vary from one StaticHeteroGraph to another, but it is the same for all snapshots within a specific StaticHeteroGraph. Additionally, some edges have attributes, so each StaticHeteroGraph has an edge_attr_dict that defines the attributes for specific edge types in the edge_index_dict, which may also be empty.

My goal is to predict the target value of only one node type in the x_dict (in this case, the target of the "Room" component). I want to use GNN methods, like SAGEConv, to generate the embedding of nodes in each snapshot, concatenate the node embeddings from all snapshots, and use a 1D ResNet to predict the target value of a specific node type. I aim to use HeteroConv to apply two SAGEConv layers to generate node embeddings in each snapshot, considering both edge_index_dict and edge_attr_dict.

The issue is that, as previously mentioned, edge_attr_dict specifies attributes only for specific types of edges. Additionally, I want to exclude "dummy nodes" when updating features so that the result after the two SAGEConv layers is a dictionary that gives a tensor of shape (n_nodes, updated_features) for node_type['Room']. The n_nodes are the nodes with input features in the x_dict that differ from just zeros, and updated_features are the out_channels for the Room node_type.

I have managed to achieve this for a single StaticHeteroGraph by creating a mask_dict to remove "dummy" nodes from the x_dict, and creating a new_edge_index_dict based on a new index_mapping. However, looping through each snapshot is slow due to the number of snapshots (8760 time-steps) in each StaticHeteroGraph, and the total number of StaticHeteroGraphs is 4000.

I have two questions:
(1) How can I create a dataloader that generates batches to train over multiple snapshots and multiple StaticHeteroGraphs?
(2) I am considering whether I can train the model over several StaticHeteroGraphs with varying numbers of nodes, but with the same edge_type. Is this possible, or do I need to maintain the same structure? Can I apply masking and re-indexing while creating the StaticHeteroGraphs without adding many "dummy nodes"?

rusty1s · 2023-04-20T13:59:34Z

rusty1s
Apr 20, 2023
Maintainer

I don't have any experience with PyTorch Geometric Temporal, so I am not really confident in given a good answer here. Your solution sounds solid, and HeteroConv should support edge features that are only present in some edge types.

If using torch_geometric.loader.DataLoader, you can either do this by not shuffling the data, or specifying a custom sampler that takes care of of which data objects should be collected within a given mini-batch.
This should be possible without the need to maintain dummy nodes.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to handle multiple StaticHeteroGraphs? #7206

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

How to handle multiple StaticHeteroGraphs? #7206

Uh oh!

MicheleSodano Apr 19, 2023

Replies: 1 comment

Uh oh!

rusty1s Apr 20, 2023 Maintainer

MicheleSodano
Apr 19, 2023

rusty1s
Apr 20, 2023
Maintainer