Out of GPU memory for large graphs - do we need to pass all nodes even when not-observed? #2833

fbragman · 2021-07-08T10:00:47Z

fbragman
Jul 8, 2021

Hi pytorch-geo community!

I have a GNN for graph classification - it is basically an extension of GAT with an encoder/decoder followed by some pooling and an MLP for classification.

The number of possible nodes in my dataset is around 80,000. The input features are either learned using nn.Embedding or obtained from FastText and have dimensionality 100.

When I create my graph datasets, I do the following

node_features = self.embedding.transform(torch.arange(self.num_nodes).long())
data = [Data(x=node_features, edge_index=y.long(),) for y in edge_index]
batch = Batch.from_data_list(data)

In this case, we have the observed edges for each graph stored in edge_index. node_features is a tensor of size 80,000 x 100. The biggest batch-size I can train on is around 5. Multi-gpu training isn't too effective as even with 4 gpus, my batch-size is only 20.

Is there a way to get around this? I initially tried only passing features of the observed nodes such as node_features.shape = [17, 100] but it didn't work. Could node_features be a Sparse Tensor for instance? It seems somewhat redundant to have to pass all possible nodes when only a small fraction are observed!

Many thanks for any suggestions :)

Answered by rusty1s

Jul 9, 2021

AFAIK, edge_index does not contain connections of all 80,000 nodes, right? With that, you can probably convert data into a subgraph that only hold the nodes that are contained in edge_index.unique(), e.g., via torch_geometric.utils.subgraph.

View full answer

rusty1s · 2021-07-09T06:57:34Z

rusty1s
Jul 9, 2021
Maintainer

AFAIK, edge_index does not contain connections of all 80,000 nodes, right? With that, you can probably convert data into a subgraph that only hold the nodes that are contained in edge_index.unique(), e.g., via torch_geometric.utils.subgraph.

6 replies

rusty1s Jul 9, 2021
Maintainer

n_id = edge_index.unique()
x = huge_feature_node_tensor[n_id]
edge_index, _ = subgraph(n_id, edge_index, relabel_nodes=True)

fbragman Jul 9, 2021
Author

Oh great! That makes a lot of sense now for subgraph to essentially relabel edge_index to [0, num_observed_nodes]. I probably would have ended up hacking up something similar if it wasn't for subgraph :)

Thank you!

fbragman Jul 9, 2021
Author

One question actually - if I am looping over multiple graphs then by doing

    ds = []
    for edge in edge_index:
        n_id = edge.unique()
        x = nodes[n_id]
        e, _ = subgraph(n_id, edge, relabel_nodes=True)
        ds.append(Data(x=x, edge_index=e))
    data = Batch.from_data_list(ds)

This will still be fine even though each edge_index[idx] has been mapped to [0, num_observed] range because we are passing the indexed x = nodes[n_id] to Data(), hence our GNN operations will still be correct?

rusty1s Jul 9, 2021
Maintainer

Exactly :)

fbragman Jul 9, 2021
Author

Great - thanks again and major props on the package - it's really amazing 🥇

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Out of GPU memory for large graphs - do we need to pass all nodes even when not-observed? #2833

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 6 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Out of GPU memory for large graphs - do we need to pass all nodes even when not-observed? #2833

Uh oh!

Uh oh!

fbragman Jul 8, 2021

Replies: 1 comment · 6 replies

Uh oh!

rusty1s Jul 9, 2021 Maintainer

Uh oh!

rusty1s Jul 9, 2021 Maintainer

Uh oh!

fbragman Jul 9, 2021 Author

Uh oh!

Uh oh!

fbragman Jul 9, 2021 Author

Uh oh!

rusty1s Jul 9, 2021 Maintainer

Uh oh!

fbragman Jul 9, 2021 Author

fbragman
Jul 8, 2021

Replies: 1 comment 6 replies

rusty1s
Jul 9, 2021
Maintainer

rusty1s Jul 9, 2021
Maintainer

fbragman Jul 9, 2021
Author

fbragman Jul 9, 2021
Author

rusty1s Jul 9, 2021
Maintainer

fbragman Jul 9, 2021
Author