Skip to content

[RFC] Default to infinite epochs, not 1000 #10343

@zplizzi

Description

@zplizzi

Currently max_epochs defaults to 1000:

If both max_epochs and max_steps aren't specified, max_epochs will default to 1000. To enable infinite training, set max_epochs = -1.

As a user, though, I would expect that if I don't specify a specific ending point, the training would continue indefinitely. In my own experiments, when the training cut off at 999 epochs, I was confused, and googling the issue didn't readily turn up this line in the documentation. When I checked my logs of all the hyperparams, max_epochs was set to None (I guess this override is applied internally). So as a user I feel like this is bad UX - I can't see a reason to put an arbitrary cutoff versus defaulting to infinite training.

It's especially frustrating when you've invested significant time into a training run, only to have it prematurely cut off due to this unexpected max_epochs limit.

cc @Borda

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions