Skip to content

Why 'val_check_interval' should be more steps then that of a epoch when set as integer? #20468

@JohnHerry

Description

@JohnHerry

Description & Motivation

I had guess that when val_check_interval is a integer, it should make validation after every val_check_interval training steps. but it seems not work, I did not specify the check_val_every_n_epoch since my validation is not based on epoches.
After I learned the document:

pass an int to check after a fixed number of training batches. An int value can only be higher than the number of training batches when check_val_every_n_epoch=None, which validates after every N training batches across epochs or iteration-based training.

I guess my setting is not take into effect just because my value is less then the steps of a epoch. I think one epoch may contains too much steps to make only one checkpoint, especially when traning large models. and we do not like to compute the rate of my validate_intervals inside one epoch to get the float value for val_check_interval option. Why there is no such an option to let us directly set the training interaval steps?

Pitch

A direct option to set the validation interval steps. on global_steps

Alternatives

No response

Additional context

No response

cc @Borda

Metadata

Metadata

Assignees

No one assigned

    Labels

    repro neededThe issue is missing a reproducible example

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions