-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 1Medium priority taskMedium priority task
Milestone
Description
🐛 Bug
If you pass Trainer(accumulate_grad_batches={5: 2}) and reload a model checkpoint using resume_from_checkpoint, checkpoint loading will crash with
Traceback (most recent call last):
...
trainer.fit(system)
File "/home/jo/.venvs/au/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 470, in fit
results = self.accelerator_backend.train()
File "/home/jo/.venvs/au/lib/python3.8/site-packages/pytorch_lightning/accelerators/gpu_accelerator.py", line 63, in train
self.trainer.train_loop.setup_training(model)
File "/home/jo/.venvs/au/lib/python3.8/site-packages/pytorch_lightning/trainer/training_loop.py", line 175, in setup_training
self.trainer.checkpoint_connector.restore_weights(model)
File "/home/jo/.venvs/au/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 64, in restore_weights
self.restore(self.trainer.resume_from_checkpoint, on_gpu=self.trainer.on_gpu)
File "/home/jo/.venvs/au/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 102, in restore
self.restore_training_state(checkpoint)
File "/home/jo/.venvs/au/lib/python3.8/site-packages/pytorch_lightning/trainer/connectors/checkpoint_connector.py", line 164, in restore_training_state
expected_steps = self.trainer.num_training_batches / n_accum
TypeError: unsupported operand type(s) for /: 'int' and 'dict'
Relevant code assumes accumulate_grad_batches is an integer:
Environment
* Packages:
- numpy: 1.19.4
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu110
- pytorch-lightning: 1.1.0
- tqdm: 4.51.0
* System:
- OS: Linux
- architecture:
- 64bit
- ELF
- processor:
- python: 3.8.5
- version: #1 SMP PREEMPT Tue Dec 22 08:14:42 UTC 2020
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workinghelp wantedOpen to be worked onOpen to be worked onpriority: 1Medium priority taskMedium priority task