-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x
Description
Bug description
When a Lightning script crashes or is interrupted, it leaves behind temporary directories that trigger ResourceWarnings during Python shutdown
What version are you seeing the problem on?
v2.5
How to reproduce the bug
Reproduction:
1. Create a basic Lightning training script
Force it to crash (e.g., with a KeyboardInterrupt or OOM)
Observe ResourceWarnings about temporary directories during shutdown
Minimal Example:
import lightning.pytorch as pl
exit()
Error messages and logs
[2025-01-15 12:18:54,186][py.warnings][WARNING] - /usr/lib/python3.10/tempfile.py:1008: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmpbrhhqlnk'>
Environment
- tabulate: 0.9.0 [0/1862]
- tbb: 2021.13.1
- tblib: 3.0.0
- tensorboard: 2.16.2
- tensorboard-data-server: 0.7.2
- tensorrt: 10.5.0
- terminado: 0.18.1
- texttable: 1.7.0
- thinc: 8.2.5
- threadpoolctl: 3.5.0
- thriftpy2: 0.4.20
- tifffile: 2025.1.10
- timm: 1.0.11
- tinycss2: 1.3.0
- tokenizers: 0.21.0
- tomli: 2.0.2
- toolz: 0.12.1
- torch: 2.7.0.dev20250112+cu126
- torch-tensorrt: 2.5.0a0
- torchmetrics: 1.3.2
- torchprofile: 0.0.4
- torchscale: 0.3.0
- torchvision: 0.22.0.dev20250112+cu126
- tornado: 6.2
- tqdm: 4.66.5
- traitlets: 5.14.3
- transformers: 4.49.0.dev0
- treelite: 4.3.0
- tritonclient: 2.32.0
- typer: 0.12.5
- types-dataclasses: 0.6.6
- types-python-dateutil: 2.9.0.20241003
- typing-extensions: 4.12.2
- tzdata: 2024.1
- ucx-py: 0.39.0
- ucxx: 0.39.0
- uri-template: 1.3.0
- urllib3: 2.0.7
- uv: 0.5.18
- wandb: 0.19.2
- wasabi: 1.1.3
- wcwidth: 0.2.13
- weasel: 0.4.1
- webcolors: 24.8.0
- webencodings: 0.5.1
- websocket-client: 1.3.3
- werkzeug: 3.0.4
- wheel: 0.44.0
- wrapt: 1.16.0
- wurlitzer: 3.1.1
- xdoctest: 1.0.2
- xgboost: 2.1.1
- yarl: 1.9.4
- zarr: 2.18.2
- zict: 3.0.0
- zipp: 3.20.0
- zope.event: 5.0
- zope.interface: 7.2
- System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.10.12
- release: 5.15.0-1063-nvidia
- version: added test model to do also #64-Ubuntu SMP Fri Aug 9 17:13:45 UTC 2024
More info
base docker nvcr.io/nvidia/pytorch:24.10-py3
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x