Skip to content

Memory leakage observed when used with torch.compile #237

@leehyeonbeen

Description

@leehyeonbeen

Dear author,
I recognize that the Autoformer codes are cloned around many different repositories, and I came here to report an issue.
I noticed the Autoformer model can possibly produce
cumulative memory usage when the model instance is wrapped and compiled with torch.compile.

Under the same training function, the issue is observed

  • ONLY for Autoformer model while training other types of models do not show similar problems
  • Problem not reproduced when the model is not wrapped with torch.compile
  • Memory usage cumulates til the end of the training loop (the entire epochs), despite torch.cuda.empty_cache() gc.collect() or direct del <VARIABLES> were aggressively used inside the loop, ending up in out of memory error.

It is notable that FEDformer model which shares most of the blocks with Autoformer except the AutoCorrelation swapped to FourierBlock does not produce similar issue.
Contrary to my expectation, partially disabling compiling on AutoCorrelation forward did not solve the problem. Rather, disabling compiling of EncoderLayer slows down the speed of cumulation.

I resolved the problem by totally disabling compiling with @torch.compiler.disable() at the model's forward function.

os==ubuntu-22.04-lts
python==3.10
torch==2.6.0+cu118

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions