-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Hi, Thanks for your great project. I encounter a problem when I try to deploy in my server( Ubuntu 24.04, RAM 64G)
Title: [Bug] Memory Leak (RAM) during training due to itertools.cycle usage
Description:
I encountered a consistent system memory (RAM) leak while training. The RAM usage increases linearly until the process is terminated by the OS (Killed). After investigation, the issue was identified as the use of itertools.cycle(dataloader) in the training loop.
The Cause:
itertools.cycle caches the outputs of the iterable during the first pass to facilitate subsequent loops. In 3DGS, where the dataloader returns large tensors (high-res images, masks, etc.), this causes the entire dataset to be cached in RAM, leading to a crash once the first epoch/iteration pass is complete or as the cache grows. (By Google Gemini3)
Environment:
- OS: Ubuntu 24.04
- CUDA: 11.8
- PyTorch: 2.0.0+cu118
- Python: 3.8
Dependencies:
annotated-types 0.7.0
appdirs 1.4.4
asttokens 3.0.1
attrs 25.3.0
backcall 0.2.0
bvh_tracing 0.0.0
certifi 2022.12.7
charset-normalizer 2.1.1
click 8.1.8
cmake 3.25.0
comm 0.2.3
contourpy 1.1.1
cycler 0.12.1
decorator 5.2.1
docker-pycreds 0.4.0
docopt 0.6.2
eval_type_backport 0.3.1
executing 2.2.1
fastjsonschema 2.21.2
filelock 3.16.1
fonttools 4.57.0
freetype-py 2.5.1
gitdb 4.0.12
GitPython 3.1.46
gs_sss_rasterization 0.0.0
hdrpy 0.3.3
idna 3.4
imageio 2.35.1
importlib_metadata 8.5.0
importlib_resources 6.4.5
ipython 8.12.3
ipywidgets 8.1.8
jedi 0.19.2
Jinja2 3.1.6
jsonschema 4.23.0
jsonschema-specifications 2023.12.1
jupyter_core 5.8.1
jupyterlab_widgets 3.0.16
kiwisolver 1.4.7
kornia 0.7.3
kornia_rs 0.1.10
lazy_loader 0.4
lightning-utilities 0.11.9
lit 15.0.7
lpips 0.1.4
MarkupSafe 2.1.5
matplotlib 3.7.5
matplotlib-inline 0.1.7
mpmath 1.3.0
narwhals 1.42.1
nbformat 5.10.4
networkx 3.1
numpy 1.24.4
nvdiffrast 0.3.1
opencv-python 4.13.0.92
packaging 24.1
pandas 2.0.3
parso 0.8.6
pdf2image 1.17.0
pexpect 4.9.0
pickleshare 0.7.5
pillow 10.4.0
pip 24.2
pkgutil_resolve_name 1.3.10
platformdirs 4.3.6
plotly 6.5.2
plyfile 1.0.3
prompt-toolkit 1.0.18
protobuf 5.29.6
psutil 7.2.2
ptyprocess 0.7.0
pure_eval 0.2.3
pydantic 2.10.6
pydantic_core 2.27.2
pyglet 2.1.13
Pygments 2.19.2
PyOpenGL 3.1.0
pyparsing 3.1.4
pyrender 0.1.45
pyrsistent 0.20.0
pysolar 0.13
python-dateutil 2.9.0.post0
python-slugify 8.0.4
pytz 2025.2
PyWavelets 1.4.1
PyYAML 6.0.3
referencing 0.35.1
requests 2.28.1
rpds-py 0.20.1
scikit-image 0.21.0
scipy 1.10.1
sentry-sdk 2.53.0
setproctitle 1.3.7
setuptools 75.1.0
simple_knn 0.0.0 /home/XXX/Project/SSS-GS/simple-knn
six 1.17.0
skylibs 0.7.6
smmap 5.0.2
stack-data 0.6.3
sympy 1.13.3
tenacity 9.0.0
text-unidecode 1.3
tifffile 2023.7.10
torch 2.0.0+cu118
torch-scatter 2.0.5
torchaudio 2.0.1+cu118
torchmetrics 1.5.2
torchvision 0.15.1+cu118
tqdm 4.67.3
traitlets 5.14.3
trimesh 4.11.2
triton 2.0.0
typing_extensions 4.12.2
tzdata 2025.3
urllib3 1.26.13
wandb 0.24.2
wcwidth 0.6.0
wheel 0.44.0
widgetsnbextension 4.0.15
with 0.0.7
zipp 3.20.2Workaround:
Replace data_iterator = cycle(dataloader) with a manual iterator reset logic to allow Python's Garbage Collector to release memory from previous epochs.
# Instead of:
# data_iterator = cycle(dataloader)
# Use manual reset:
data_iterator = iter(dataloader)
try:
camera_batch = next(data_iterator)
except (StopIteration, NameError):
data_iterator = iter(dataloader)
camera_batch = next(data_iterator)