Skip to content

evaluation_loop memory leakย #8453

@hackgoofer

Description

@hackgoofer

๐Ÿ› Bug

When I first encountered the bug, it manifested in step 13 of the evaluation loop. This is surprising because training was running smoothly, but I received a CUDA out of memory during evaluation. I then printed out the memory allocation in between each evaluation batch, and I saw memory allocated was increasing slightly.

Please reproduce using the BoringModel

BoringModel

To Reproduce

just run the above file: python val_mem_leak.py and observe the print statement. I will share a segment here just to illustrate the slow increase of the memory allocation. Look under the Allocated Memory section between batch idx 0 and 13.
While this does not generate a OOM, if left unfixed, over time, it will. On my experiment, since it was using a much bigger model, it was OOMing in the evaluation run.

memory summary: |===========================================================================|                                                                                                                                               |                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |   14724 KB |   14761 KB |   22139 KB |    7415 KB |
|       from large pool |   13550 KB |   13592 KB |   20938 KB |    7388 KB |
|       from small pool |    1174 KB |    1192 KB |    1201 KB |      27 KB |
|---------------------------------------------------------------------------|
| Active memory         |   14724 KB |   14761 KB |   22139 KB |    7415 KB |
|       from large pool |   13550 KB |   13592 KB |   20938 KB |    7388 KB |
|       from small pool |    1174 KB |    1192 KB |    1201 KB |      27 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |   22528 KB |   22528 KB |   22528 KB |       0 B  |
|       from large pool |   20480 KB |   20480 KB |   20480 KB |       0 B  |
|       from small pool |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |    7804 KB |   20075 KB |   31969 KB |   24165 KB |
|       from large pool |    6930 KB |   18412 KB |   25800 KB |   18870 KB |
|       from small pool |     874 KB |    2047 KB |    6168 KB |    5294 KB |
|---------------------------------------------------------------------------|
| Allocations           |      54    |      56    |      72    |      18    |
|       from large pool |       4    |       4    |       5    |       1    |
|       from small pool |      50    |      52    |      67    |      17    |
|---------------------------------------------------------------------------|
| Active allocs         |      54    |      56    |      72    |      18    |
|       from large pool |       4    |       4    |       5    |       1    |
|       from small pool |      50    |      52    |      67    |      17    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       2    |       2    |       2    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       1    |       1    |       1    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       3    |       3    |       9    |       6    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       2    |       2    |       8    |       6    |
|===========================================================================|

memory reserved: 23068672
memory allocated: 15077376
validation steps...0
memory summary: |===========================================================================|                                                                                                                                      [54/1822]
|                  PyTorch CUDA memory summary, device ID 0                 |
|---------------------------------------------------------------------------|
|            CUDA OOMs: 0            |        cudaMalloc retries: 0         |
|===========================================================================|
|        Metric         | Cur Usage  | Peak Usage | Tot Alloc  | Tot Freed  |
|---------------------------------------------------------------------------|
| Allocated memory      |   14732 KB |   14761 KB |   22297 KB |    7564 KB |
|       from large pool |   13550 KB |   13592 KB |   20938 KB |    7388 KB |
|       from small pool |    1182 KB |    1194 KB |    1358 KB |     176 KB |
|---------------------------------------------------------------------------|
| Active memory         |   14732 KB |   14761 KB |   22297 KB |    7564 KB |
|       from large pool |   13550 KB |   13592 KB |   20938 KB |    7388 KB |
|       from small pool |    1182 KB |    1194 KB |    1358 KB |     176 KB |
|---------------------------------------------------------------------------|
| GPU reserved memory   |   22528 KB |   22528 KB |   22528 KB |       0 B  |
|       from large pool |   20480 KB |   20480 KB |   20480 KB |       0 B  |
|       from small pool |    2048 KB |    2048 KB |    2048 KB |       0 B  |
|---------------------------------------------------------------------------|
| Non-releasable memory |    7795 KB |   20075 KB |   32118 KB |   24322 KB |
|       from large pool |    6930 KB |   18412 KB |   25800 KB |   18870 KB |
|       from small pool |     865 KB |    2047 KB |    6317 KB |    5452 KB |
|---------------------------------------------------------------------------|
| Allocations           |      71    |      73    |     196    |     125    |
|       from large pool |       4    |       4    |       5    |       1    |
|       from small pool |      67    |      69    |     191    |     124    |
|---------------------------------------------------------------------------|
| Active allocs         |      71    |      73    |     196    |     125    |
|       from large pool |       4    |       4    |       5    |       1    |
|       from small pool |      67    |      69    |     191    |     124    |
|---------------------------------------------------------------------------|
| GPU reserved segments |       2    |       2    |       2    |       0    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       1    |       1    |       1    |       0    |
|---------------------------------------------------------------------------|
| Non-releasable allocs |       3    |       5    |      69    |      66    |
|       from large pool |       1    |       1    |       1    |       0    |
|       from small pool |       2    |       4    |      68    |      66    |
|===========================================================================|

memory reserved: 23068672
memory allocated: 15086080
validation steps...13

Expected behavior

memory allocation should remain the same.

Environment

PyTorch version: 1.8.1+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3

Python version: 3.8 (64-bit runtime)
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 450.80.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.4
[pip3] pytorch-lightning==1.4.0.dev0
[pip3] torch==1.8.1
[pip3] torchaudio==0.8.1
[pip3] torchmetrics==0.4.1
[pip3] torchtext==0.5.0
[pip3] torchvision==0.9.1
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               10.1.243             h6bb024c_0
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py38he904b0f_0
[conda] mkl_fft                   1.2.0            py38h23d657b_0
[conda] mkl_random                1.1.1            py38h0573a6f_0
[conda] numpy                     1.19.4                   pypi_0    pypi
[conda] numpy-base                1.19.2           py38hfa32c7d_0
[conda] pytorch-lightning         1.4.0.dev0               pypi_0    pypi
[conda] torch                     1.8.1                    pypi_0    pypi
[conda] torchaudio                0.8.1                    pypi_0    pypi
[conda] torchmetrics              0.3.2                    pypi_0    pypi
[conda] torchtext                 0.5.0                    pypi_0    pypi
[conda] torchvision               0.9.1                    pypi_0    pypi

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedOpen to be worked onpriority: 0High priority taskwaiting on authorWaiting on user action, correction, or update

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions