-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Description
Currently max_epochs defaults to 1000:
If both max_epochs and max_steps aren't specified, max_epochs will default to 1000. To enable infinite training, set max_epochs = -1.
As a user, though, I would expect that if I don't specify a specific ending point, the training would continue indefinitely. In my own experiments, when the training cut off at 999 epochs, I was confused, and googling the issue didn't readily turn up this line in the documentation. When I checked my logs of all the hyperparams, max_epochs was set to None (I guess this override is applied internally). So as a user I feel like this is bad UX - I can't see a reason to put an arbitrary cutoff versus defaulting to infinite training.
It's especially frustrating when you've invested significant time into a training run, only to have it prematurely cut off due to this unexpected max_epochs limit.
cc @Borda