Memory leak when replacing part of a tensor #6283

fjulian · 2022-12-22T12:10:33Z

fjulian
Dec 22, 2022

I encountered a situation where memory usage increases linearly during training, although all tensor variables are overwritten or even explicitly deleted regularly. This occurs when passing an input batch through some layers (e.g. MLP), and then replacing part of the input batch with the output of the forward pass.

Can someone help me figure out why this is happening, and how I can achieve the same result without the memory growing throughout the training?

The batch.detach() in the forward call, and the del batch_out are already attempts to solve this, however without success.

Thanks in advance for any hints!

#!/usr/bin/env python3
import torch
from torch import nn
import torch_geometric
import tracemalloc


class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layers = torch_geometric.nn.MLP([300, 32, 7])

    def forward(self, batch):
        batch_ptr = 1 + batch.ptr[:-1]
        batch_ptr = batch_ptr.tolist()
        new_features = self.linear_layers(batch.x)
        batch.x[batch_ptr, :7] = new_features[batch_ptr, :]
        return batch


def main():
    data_list = list()
    batch_size = 100
    num_features = 300
    for i in range(batch_size):
        data_list.append(
            torch_geometric.data.data.Data(x=torch.randn(12, num_features))
        )
    batch = torch_geometric.data.batch.Batch.from_data_list(data_list)

    mdl = Model()

    tracemalloc.start()
    for i in range(5000):
        batch_out = mdl.forward(batch.detach())
        if i % 1000 == 0:
            print(f"Memory ({i}): {tracemalloc.get_traced_memory()}")
        del batch_out

    tracemalloc.stop()


if __name__ == "__main__":
    main()

Answered by rusty1s

Dec 23, 2022

That's because you are modifying batch in-place, so the computation graph is never freed. Note that although you detach() the batch object, the old features are still kept in memory for backpropagation. This resolves the issue:

mdl.forward(copy.copy(batch))

View full answer

rusty1s · 2022-12-23T08:59:09Z

rusty1s
Dec 23, 2022
Maintainer

That's because you are modifying batch in-place, so the computation graph is never freed. Note that although you detach() the batch object, the old features are still kept in memory for backpropagation. This resolves the issue:

mdl.forward(copy.copy(batch))

5 replies

fjulian Jan 3, 2023
Author

Thank you for the quick reply! Your argument makes sense, and while changing the forward call to batch_out = mdl.forward(copy.copy(batch)) reduces the memory footprint by about 30%, memory usage still increases linearly with every iteration. And while copy.deepcopy significantly slows down the execution, it also doesn't solve the issue. I wonder whether this is unavoidable when modifying batch in-place, or whether there is still a bug somewhere.

rusty1s Jan 3, 2023
Maintainer

Mh, I re-ran it and you are correct. I guess my previous answer was incorrect since the computation graph is still not freed. You should run

with torch.no_grad():
    batch_out = mdl.forward(batch)

instead.

fjulian Jan 3, 2023
Author

Yes, but this is only a solution for inference. What if I also want to train the model using the forward pass (for simplicity I dropped backpropagation in the example snippet)? The gradients would be needed in that case, correct?

rusty1s Jan 3, 2023
Maintainer

If you want to train it, you would need an optimizer and call optimizer.zero_grad() to free-up the computation graph.

fjulian Jan 4, 2023
Author

Thanks a lot for your support!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Memory leak when replacing part of a tensor #6283

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 5 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Memory leak when replacing part of a tensor #6283

Uh oh!

fjulian Dec 22, 2022

Replies: 1 comment · 5 replies

Uh oh!

rusty1s Dec 23, 2022 Maintainer

Uh oh!

fjulian Jan 3, 2023 Author

Uh oh!

rusty1s Jan 3, 2023 Maintainer

Uh oh!

fjulian Jan 3, 2023 Author

Uh oh!

rusty1s Jan 3, 2023 Maintainer

Uh oh!

fjulian Jan 4, 2023 Author

fjulian
Dec 22, 2022

Replies: 1 comment 5 replies

rusty1s
Dec 23, 2022
Maintainer

fjulian Jan 3, 2023
Author

rusty1s Jan 3, 2023
Maintainer

fjulian Jan 3, 2023
Author

rusty1s Jan 3, 2023
Maintainer

fjulian Jan 4, 2023
Author