-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
bugSomething isn't workingSomething isn't workinglogger: wandbWeights & BiasesWeights & Biasesver: 2.4.x
Description
Bug description
When interrupting a run with Ctrl+C, the WandbLogger does not upload a checkpoint artifact
What version are you seeing the problem on?
v2.4
How to reproduce the bug
No response
Error messages and logs
Epoch 20: 28%|βββ | 6502/23178 [29:11<1:14:53, 3.71it/s, v_num=gwj7, train_loss=nan.0]^C
Detected KeyboardInterrupt, attempting graceful shutdown ...
wandb: π View run train-release-0.1 at: https://wandb.ai/eschwartz/dire/runs/uvexgwj7
Epoch 20: 28%|βββ | 6502/23178 [29:16<1:15:04, 3.70it/s, v_num=gwj7, train_loss=nan.0]
Environment
Current environment
- CUDA:
- GPU:
- NVIDIA GeForce RTX 4070 Laptop GPU
- available: True
- version: 12.1
- GPU:
- Lightning:
- lightning-utilities: 0.11.7
- pytorch-lightning: 2.4.0
- torch: 2.3.0
- torchmetrics: 1.6.0
- Packages:
- absl-py: 2.1.0
- aiohappyeyeballs: 2.4.3
- aiohttp: 3.10.10
- aiosignal: 1.3.1
- appdirs: 1.4.4
- asttokens: 2.4.1
- async-timeout: 4.0.3
- attrs: 23.2.0
- braceexpand: 0.1.7
- certifi: 2024.2.2
- charset-normalizer: 3.3.2
- click: 8.1.7
- decorator: 5.1.1
- docker-pycreds: 0.4.0
- docopt: 0.6.2
- editdistance: 0.5.3
- et-xmlfile: 1.1.0
- exceptiongroup: 1.2.2
- executing: 2.1.0
- filelock: 3.13.4
- frozenlist: 1.5.0
- fsspec: 2024.3.1
- future: 1.0.0
- gitdb: 4.0.11
- gitpython: 3.1.43
- grpcio: 1.62.2
- hjson: 3.1.0
- idna: 3.7
- ipdb: 0.13.13
- ipython: 8.27.0
- jedi: 0.19.1
- jep: 4.2.0
- jinja2: 3.1.3
- jsonlines: 4.0.0
- jsonnet: 0.16.0
- lightning-utilities: 0.11.7
- markdown: 3.6
- markdown-it-py: 2.2.0
- markupsafe: 2.1.5
- matplotlib-inline: 0.1.7
- mdurl: 0.1.2
- mpmath: 1.3.0
- msgpack: 1.0.8
- multidict: 6.1.0
- networkx: 3.3
- numpy: 1.26.4
- nvidia-cublas-cu12: 12.1.3.1
- nvidia-cuda-cupti-cu12: 12.1.105
- nvidia-cuda-nvrtc-cu12: 12.1.105
- nvidia-cuda-runtime-cu12: 12.1.105
- nvidia-cudnn-cu12: 8.9.2.26
- nvidia-cufft-cu12: 11.0.2.54
- nvidia-curand-cu12: 10.3.2.106
- nvidia-cusolver-cu12: 11.4.5.107
- nvidia-cusparse-cu12: 12.1.0.106
- nvidia-nccl-cu12: 2.20.5
- nvidia-nvjitlink-cu12: 12.4.127
- nvidia-nvtx-cu12: 12.1.105
- objectio: 0.2.29
- openpyxl: 3.1.2
- packaging: 24.1
- pandas: 2.2.2
- parso: 0.8.4
- pexpect: 4.9.0
- pillow: 10.3.0
- pip: 22.0.2
- platformdirs: 4.3.6
- prompt-toolkit: 3.0.47
- propcache: 0.2.0
- protobuf: 4.25.3
- psutil: 5.9.8
- ptyprocess: 0.7.0
- pure-eval: 0.2.3
- pyelftools: 0.31
- pygments: 2.6.1
- python-dateutil: 2.9.0.post0
- pytorch-lightning: 2.4.0
- pytz: 2024.1
- pyyaml: 6.0.1
- requests: 2.31.0
- rich: 13.2.0
- sentencepiece: 0.1.99
- sentry-sdk: 2.0.1
- setproctitle: 1.3.3
- setuptools: 59.6.0
- shellingham: 1.5.4
- simplejson: 3.19.2
- six: 1.16.0
- smmap: 5.0.1
- stack-data: 0.6.3
- sympy: 1.12
- tensorboard: 2.16.2
- tensorboard-data-server: 0.7.2
- tomli: 2.0.1
- torch: 2.3.0
- torchmetrics: 1.6.0
- tqdm: 4.66.2
- traitlets: 5.14.3
- triton: 2.3.0
- typer: 0.12.3
- typing-extensions: 4.11.0
- tzdata: 2024.1
- ujson: 3.2.0
- urllib3: 2.2.1
- wandb: 0.18.6
- wcwidth: 0.2.13
- webdataset: 0.2.100
- werkzeug: 3.0.2
- yarl: 1.16.0
- System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor: x86_64
- python: 3.10.12
- release: 6.8.0-48-generic
- version: Update README.mdΒ #48~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Oct 7 11:24:13 UTC 2
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinglogger: wandbWeights & BiasesWeights & Biasesver: 2.4.x