total_train_steps too high

Hi,

total_train_steps is currently at 200_000. This seems way too high; I get a val_loss of ~3.8 after around 1000 steps, and a perplexity of around 40. 

Edit: When using torch 2.0, Setting total_train_steps to 1000 leads to an exception:

```
File "main.py", line 522, in main
    scheduler = torch.optim.lr_scheduler.OneCycleLR(optimizer=opt, max_lr=hyp['opt']['lr'], total_steps=hyp['opt']['total_train_steps'], pct_start=hyp['opt']['warmup_percent'], anneal_strategy='linear', cycle_momentum=False, div_factor=1e2, final_div_factor=.02)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 1676, in __init__
    super().__init__(optimizer, last_epoch, verbose)
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 79, in __init__
    self._initial_step()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 85, in _initial_step
    self.step()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 150, in step
    values = self.get_lr()
  File "/home/ubuntu/.local/lib/python3.8/site-packages/torch/optim/lr_scheduler.py", line 1714, in get_lr
    pct = (step_num - start_step) / (end_step - start_step)
ZeroDivisionError: float division by zero
```

(I use a slightly changed version of this package, but didn't touch main.py or any of the building blocks other than total_train_steps).

Using `'total_train_steps' = 2_000` seems to work fine for me, so I would cautiously suggest doing that :) 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

total_train_steps too high #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

total_train_steps too high #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions