-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Description
Hi,
total_train_steps is currently at 200_000. This seems way too high; I get a val_loss of ~3.8 after around 1000 steps, and a perplexity of around 40.
Edit: When using torch 2.0, Setting total_train_steps to 1000 leads to an exception:
File "main.py", line 522, in main
scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer=opt, max_lr=hyp['opt']['lr'], total_steps=hyp['opt']['total_train_steps'], pct_start=hyp['opt']['warmup_percent'], anneal_strategy='linear', cycle_momentum=False, div_factor=1e2, final_div_factor=.02)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 1676, in __init__
super().__init__(optimizer, last_epoch, verbose)
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 79, in __init__
self._initial_step()
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 85, in _initial_step
self.step()
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 150, in step
values = self.get_lr()
File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 1714, in get_lr
pct = (step_num - start_step) / (end_step - start_step)
ZeroDivisionError: float division by zero
(I use a slightly changed version of this package, but didn't touch main.py or any of the building blocks other than total_train_steps).
Using 'total_train_steps' = 2_000 seems to work fine for me, so I would cautiously suggest doing that :)
Metadata
Metadata
Assignees
Labels
No labels