memory leak involving MessagePassing layers #5333

APJansen · 2022-09-01T08:50:20Z

APJansen
Sep 1, 2022

I am having trouble with a memory leak. I have a model involving layers subclassed from MessagePassing. The model itself is subclassed from torch.nn.Module as usual, the GNN layers being called in its forward method. The model takes a graph representing a system state and outputs some tensors representing the system state at a next timestep.
It is used inside the Simulator class below, which has a rollout method that iteratively does these predictions for multiple steps.

The issue is, on rollout the memory used increases by very roughly half a Gb every step. Three things that do not give a memory leak are:

Training the model though on 1 step predictions using Simulator's forward method
rollout with the GNN layers in the model disabled
rollout with the last 2 lines in the for loop disabled (but with either one active the leak is still there)

I don't see the pattern, any clue?

I don't think I'm doing anything weird in the GNN layer, I just implemented message and update methods which concatenate their inputs and pass it through a small MLP, and call propagate inside forward. Inside the model, the gnn layers are called as (where self.gnn_layers is a nn.ModuleList of the layers):

for gnn_layer in self.gnn_layers:
    h = gnn_layer(h, v, pos, r, graph_features, domain, edge_index, batch)

class Simulator(torch.nn.Module):
    def __init__(self, model, graph_generator) -> None:
        super().__init__()
        self.model = model
        self.graph_generator = graph_generator

    def forward(self, graph_data: GraphData, step: int) -> Prediction:
        graph = self.graph_generator.build_graph(graph_data, step)
        return self.model(graph)

    def rollout(self,
            initial_data: GraphData,
            domain_sequence: torch.Tensor,
            time_sequence: torch.Tensor,
            ) -> Prediction:
        graph = self.graph_generator.build_graph(initial_data, 0)

        T = time_sequence.shape[0]
        predictions = []

        for t in range(T - 1):
            print(t)
            prediction = self.model(graph)
            predictions.append(prediction)
            graph = self.graph_generator.evolve(graph, prediction, time_sequence[t + 1], domain_sequence[t + 1])

        return PredictionSequence(predictions)

Answered by rusty1s

Sep 2, 2022

I think this is to be expected, right? Note that the memory is not freed here until you either detach computations from the computation graph (e.g., via predictions.append(prediction.detach()) or until you compute loss.backward(). Depending on T, this may result in OOMs.

View full answer

rusty1s · 2022-09-02T12:36:29Z

rusty1s
Sep 2, 2022
Maintainer

I think this is to be expected, right? Note that the memory is not freed here until you either detach computations from the computation graph (e.g., via predictions.append(prediction.detach()) or until you compute loss.backward(). Depending on T, this may result in OOMs.

1 reply

APJansen Sep 5, 2022
Author

Great, that solves it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

memory leak involving MessagePassing layers #5333

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

memory leak involving MessagePassing layers #5333

Uh oh!

APJansen Sep 1, 2022

Replies: 1 comment · 1 reply

Uh oh!

rusty1s Sep 2, 2022 Maintainer

Uh oh!

APJansen Sep 5, 2022 Author

APJansen
Sep 1, 2022

Replies: 1 comment 1 reply

rusty1s
Sep 2, 2022
Maintainer

APJansen Sep 5, 2022
Author