Skip to content

A problem: 62G RAM not enough #6

@lililuya

Description

@lililuya

Hi, Thanks for your great project. I encounter a problem when I try to deploy in my server( Ubuntu 24.04, RAM 64G)

Title: [Bug] Memory Leak (RAM) during training due to itertools.cycle usage

Description:
I encountered a consistent system memory (RAM) leak while training. The RAM usage increases linearly until the process is terminated by the OS (Killed). After investigation, the issue was identified as the use of itertools.cycle(dataloader) in the training loop.

The Cause:
itertools.cycle caches the outputs of the iterable during the first pass to facilitate subsequent loops. In 3DGS, where the dataloader returns large tensors (high-res images, masks, etc.), this causes the entire dataset to be cached in RAM, leading to a crash once the first epoch/iteration pass is complete or as the cache grows. (By Google Gemini3)

Environment:

  • OS: Ubuntu 24.04
  • CUDA: 11.8
  • PyTorch: 2.0.0+cu118
  • Python: 3.8

Dependencies:

annotated-types           0.7.0
appdirs                   1.4.4
asttokens                 3.0.1
attrs                     25.3.0
backcall                  0.2.0
bvh_tracing               0.0.0
certifi                   2022.12.7
charset-normalizer        2.1.1
click                     8.1.8
cmake                     3.25.0
comm                      0.2.3
contourpy                 1.1.1
cycler                    0.12.1
decorator                 5.2.1
docker-pycreds            0.4.0
docopt                    0.6.2
eval_type_backport        0.3.1
executing                 2.2.1
fastjsonschema            2.21.2
filelock                  3.16.1
fonttools                 4.57.0
freetype-py               2.5.1
gitdb                     4.0.12
GitPython                 3.1.46
gs_sss_rasterization      0.0.0
hdrpy                     0.3.3
idna                      3.4
imageio                   2.35.1
importlib_metadata        8.5.0
importlib_resources       6.4.5
ipython                   8.12.3
ipywidgets                8.1.8
jedi                      0.19.2
Jinja2                    3.1.6
jsonschema                4.23.0
jsonschema-specifications 2023.12.1
jupyter_core              5.8.1
jupyterlab_widgets        3.0.16
kiwisolver                1.4.7
kornia                    0.7.3
kornia_rs                 0.1.10
lazy_loader               0.4
lightning-utilities       0.11.9
lit                       15.0.7
lpips                     0.1.4
MarkupSafe                2.1.5
matplotlib                3.7.5
matplotlib-inline         0.1.7
mpmath                    1.3.0
narwhals                  1.42.1
nbformat                  5.10.4
networkx                  3.1
numpy                     1.24.4
nvdiffrast                0.3.1
opencv-python             4.13.0.92
packaging                 24.1
pandas                    2.0.3
parso                     0.8.6
pdf2image                 1.17.0
pexpect                   4.9.0
pickleshare               0.7.5
pillow                    10.4.0
pip                       24.2
pkgutil_resolve_name      1.3.10
platformdirs              4.3.6
plotly                    6.5.2
plyfile                   1.0.3
prompt-toolkit            1.0.18
protobuf                  5.29.6
psutil                    7.2.2
ptyprocess                0.7.0
pure_eval                 0.2.3
pydantic                  2.10.6
pydantic_core             2.27.2
pyglet                    2.1.13
Pygments                  2.19.2
PyOpenGL                  3.1.0
pyparsing                 3.1.4
pyrender                  0.1.45
pyrsistent                0.20.0
pysolar                   0.13
python-dateutil           2.9.0.post0
python-slugify            8.0.4
pytz                      2025.2
PyWavelets                1.4.1
PyYAML                    6.0.3
referencing               0.35.1
requests                  2.28.1
rpds-py                   0.20.1
scikit-image              0.21.0
scipy                     1.10.1
sentry-sdk                2.53.0
setproctitle              1.3.7
setuptools                75.1.0
simple_knn                0.0.0        /home/XXX/Project/SSS-GS/simple-knn
six                       1.17.0
skylibs                   0.7.6
smmap                     5.0.2
stack-data                0.6.3
sympy                     1.13.3
tenacity                  9.0.0
text-unidecode            1.3
tifffile                  2023.7.10
torch                     2.0.0+cu118
torch-scatter             2.0.5
torchaudio                2.0.1+cu118
torchmetrics              1.5.2
torchvision               0.15.1+cu118
tqdm                      4.67.3
traitlets                 5.14.3
trimesh                   4.11.2
triton                    2.0.0
typing_extensions         4.12.2
tzdata                    2025.3
urllib3                   1.26.13
wandb                     0.24.2
wcwidth                   0.6.0
wheel                     0.44.0
widgetsnbextension        4.0.15
with                      0.0.7
zipp                      3.20.2

Workaround:
Replace data_iterator = cycle(dataloader) with a manual iterator reset logic to allow Python's Garbage Collector to release memory from previous epochs.

# Instead of:
# data_iterator = cycle(dataloader)
# Use manual reset:
data_iterator = iter(dataloader)
try:
    camera_batch = next(data_iterator)
except (StopIteration, NameError):
    data_iterator = iter(dataloader)
    camera_batch = next(data_iterator)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions