@@ -989,11 +989,23 @@ val_check_interval
989
989
:muted:
990
990
991
991
How often within one training epoch to check the validation set.
992
- Can specify as float or int .
992
+ Can specify as float, int, or a time-based duration .
993
993
994
994
- pass a ``float `` in the range [0.0, 1.0] to check after a fraction of the training epoch.
995
995
- pass an ``int `` to check after a fixed number of training batches. An ``int `` value can only be higher than the number of training
996
996
batches when ``check_val_every_n_epoch=None ``, which validates after every ``N `` training batches across epochs or iteration-based training.
997
+ - pass a ``string `` duration in the format "DD:HH: MM:SS", a ``datetime.timedelta `` object, or a ``dictionary `` of keyword arguments that can be passed
998
+ to ``datetime.timedelta `` for time-based validation. When using a time-based duration, validation will trigger once the elapsed wall-clock time
999
+ since the last validation exceeds the interval. The validation check occurs after the current batch completes, the validation loop runs, and
1000
+ the timer resets.
1001
+
1002
+ **Time-based validation behavior with check_val_every_n_epoch: ** When used together with ``val_check_interval `` (time-based) and
1003
+ ``check_val_every_n_epoch > 1 ``, validation is aligned to epoch multiples:
1004
+
1005
+ - If the time-based interval elapses **before ** the next multiple-N epoch, validation runs at the start of that epoch (after the first batch),
1006
+ and the timer resets.
1007
+ - If the interval elapses **during ** a multiple-N epoch, validation runs after the current batch.
1008
+ - For cases where ``check_val_every_n_epoch=None `` or ``1 ``, the time-based behavior of ``val_check_interval `` applies without additional alignment.
997
1009
998
1010
.. testcode ::
999
1011
@@ -1011,10 +1023,24 @@ Can specify as float or int.
1011
1023
# (ie: production cases with streaming data)
1012
1024
trainer = Trainer(val_check_interval=1000, check_val_every_n_epoch=None)
1013
1025
1026
+ # check validation every 15 minutes of wall-clock time using a string-based approach
1027
+ trainer = Trainer(val_check_interval="00:00:15:00")
1028
+
1029
+ # check validation every 15 minutes of wall-clock time using a dictionary-based approach
1030
+ trainer = Trainer(val_check_interval={"minutes": 15})
1031
+
1032
+ # check validation every 1 hour of wall-clock time using a dictionary-based approach
1033
+ trainer = Trainer(val_check_interval={"hours": 1})
1034
+
1035
+ # check validation every 1 hour of wall-clock time using a datetime.timedelta object
1036
+ trainer = Trainer(val_check_interval=timedelta(hours=1))
1037
+
1038
+
1014
1039
1015
1040
.. code-block :: python
1016
1041
1017
1042
# Here is the computation to estimate the total number of batches seen within an epoch.
1043
+ # This logic applies when `val_check_interval` is specified as an integer or a float.
1018
1044
1019
1045
# Find the total number of train batches
1020
1046
total_train_batches = total_train_samples // (train_batch_size * world_size)
0 commit comments