@@ -989,11 +989,23 @@ val_check_interval
989989 :muted:
990990
991991How often within one training epoch to check the validation set.
992- Can specify as float or int .
992+ Can specify as float, int, or a time-based duration .
993993
994994- pass a ``float `` in the range [0.0, 1.0] to check after a fraction of the training epoch.
995995- pass an ``int `` to check after a fixed number of training batches. An ``int `` value can only be higher than the number of training
996996 batches when ``check_val_every_n_epoch=None ``, which validates after every ``N `` training batches across epochs or iteration-based training.
997+ - pass a ``string `` duration in the format "DD:HH: MM:SS", a ``datetime.timedelta `` object, or a ``dictionary `` of keyword arguments that can be passed
998+ to ``datetime.timedelta `` for time-based validation. When using a time-based duration, validation will trigger once the elapsed wall-clock time
999+ since the last validation exceeds the interval. The validation check occurs after the current batch completes, the validation loop runs, and
1000+ the timer resets.
1001+
1002+ **Time-based validation behavior with check_val_every_n_epoch: ** When used together with ``val_check_interval `` (time-based) and
1003+ ``check_val_every_n_epoch > 1 ``, validation is aligned to epoch multiples:
1004+
1005+ - If the time-based interval elapses **before ** the next multiple-N epoch, validation runs at the start of that epoch (after the first batch),
1006+ and the timer resets.
1007+ - If the interval elapses **during ** a multiple-N epoch, validation runs after the current batch.
1008+ - For cases where ``check_val_every_n_epoch=None `` or ``1 ``, the time-based behavior of ``val_check_interval `` applies without additional alignment.
9971009
9981010.. testcode ::
9991011
@@ -1011,10 +1023,24 @@ Can specify as float or int.
10111023 # (ie: production cases with streaming data)
10121024 trainer = Trainer(val_check_interval=1000, check_val_every_n_epoch=None)
10131025
1026+ # check validation every 15 minutes of wall-clock time using a string-based approach
1027+ trainer = Trainer(val_check_interval="00:00:15:00")
1028+
1029+ # check validation every 15 minutes of wall-clock time using a dictionary-based approach
1030+ trainer = Trainer(val_check_interval={"minutes": 15})
1031+
1032+ # check validation every 1 hour of wall-clock time using a dictionary-based approach
1033+ trainer = Trainer(val_check_interval={"hours": 1})
1034+
1035+ # check validation every 1 hour of wall-clock time using a datetime.timedelta object
1036+ trainer = Trainer(val_check_interval=timedelta(hours=1))
1037+
1038+
10141039
10151040.. code-block :: python
10161041
10171042 # Here is the computation to estimate the total number of batches seen within an epoch.
1043+ # This logic applies when `val_check_interval` is specified as an integer or a float.
10181044
10191045 # Find the total number of train batches
10201046 total_train_batches = total_train_samples // (train_batch_size * world_size)
0 commit comments