Skip to content

Commit 64eb3c7

Browse files
Update docs for time based validation through val_check_interval
1 parent a610ad2 commit 64eb3c7

File tree

3 files changed

+41
-4
lines changed

3 files changed

+41
-4
lines changed

docs/source-pytorch/advanced/speed.rst

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -297,7 +297,8 @@ Validation Within Training Epoch
297297

298298
For large datasets, it's often desirable to check validation multiple times within a training epoch.
299299
Pass in a float to check that often within one training epoch. Pass in an int ``K`` to check every ``K`` training batch.
300-
Must use an ``int`` if using an :class:`~torch.utils.data.IterableDataset`.
300+
Must use an ``int`` if using an :class:`~torch.utils.data.IterableDataset`. Alternatively, pass a string ("DD:HH:MM:SS"),
301+
a dict of ``datetime.timedelta`` kwargs, or a ``datetime.timedelta`` to check validation after a given amount of wall-clock time.
301302

302303
.. testcode::
303304

@@ -310,6 +311,16 @@ Must use an ``int`` if using an :class:`~torch.utils.data.IterableDataset`.
310311
# check every 100 train batches (ie: for IterableDatasets or fixed frequency)
311312
trainer = Trainer(val_check_interval=100)
312313

314+
# check validation every 15 minutes of wall-clock time
315+
trainer = Trainer(val_check_interval="00:00:15:00")
316+
317+
# alternatively, pass a dict of timedelta kwargs
318+
trainer = Trainer(val_check_interval={"minutes": 1})
319+
320+
# or use a timedelta object directly
321+
from datetime import timedelta
322+
trainer = Trainer(val_check_interval=timedelta(hours=1))
323+
313324
Learn more in our :ref:`trainer_flags` guide.
314325

315326

docs/source-pytorch/common/trainer.rst

Lines changed: 27 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -989,11 +989,23 @@ val_check_interval
989989
:muted:
990990

991991
How often within one training epoch to check the validation set.
992-
Can specify as float or int.
992+
Can specify as float, int, or a time-based duration.
993993

994994
- pass a ``float`` in the range [0.0, 1.0] to check after a fraction of the training epoch.
995995
- pass an ``int`` to check after a fixed number of training batches. An ``int`` value can only be higher than the number of training
996996
batches when ``check_val_every_n_epoch=None``, which validates after every ``N`` training batches across epochs or iteration-based training.
997+
- pass a ``string`` duration in the format "DD:HH:MM:SS", a ``datetime.timedelta`` object, or a ``dictionary`` of keyword arguments that can be passed
998+
to ``datetime.timedelta`` for time-based validation. When using a time-based duration, validation will trigger once the elapsed wall-clock time
999+
since the last validation exceeds the interval. The validation check occurs after the current batch completes, the validation loop runs, and
1000+
the timer resets.
1001+
1002+
**Time-based validation behavior with check_val_every_n_epoch:** When used together with ``val_check_interval`` (time-based) and
1003+
``check_val_every_n_epoch > 1``, validation is aligned to epoch multiples:
1004+
1005+
- If the time-based interval elapses **before** the next multiple-N epoch, validation runs at the start of that epoch (after the first batch),
1006+
and the timer resets.
1007+
- If the interval elapses **during** a multiple-N epoch, validation runs after the current batch.
1008+
- For cases where ``check_val_every_n_epoch=None`` or ``1``, the time-based behavior of ``val_check_interval`` applies without additional alignment.
9971009

9981010
.. testcode::
9991011

@@ -1011,10 +1023,24 @@ Can specify as float or int.
10111023
# (ie: production cases with streaming data)
10121024
trainer = Trainer(val_check_interval=1000, check_val_every_n_epoch=None)
10131025

1026+
# check validation every 15 minutes of wall-clock time using a string-based approach
1027+
trainer = Trainer(val_check_interval="00:00:15:00")
1028+
1029+
# check validation every 15 minutes of wall-clock time using a dictionary-based approach
1030+
trainer = Trainer(val_check_interval={"minutes": 15})
1031+
1032+
# check validation every 1 hour of wall-clock time using a dictionary-based approach
1033+
trainer = Trainer(val_check_interval={"hours": 1})
1034+
1035+
# check validation every 1 hour of wall-clock time using a datetime.timedelta object
1036+
trainer = Trainer(val_check_interval=timedelta(hours=1))
1037+
1038+
10141039

10151040
.. code-block:: python
10161041
10171042
# Here is the computation to estimate the total number of batches seen within an epoch.
1043+
# This logic applies when `val_check_interval` is specified as an integer or a float.
10181044
10191045
# Find the total number of train batches
10201046
total_train_batches = total_train_samples // (train_batch_size * world_size)

src/lightning/pytorch/trainer/trainer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -216,8 +216,8 @@ def __init__(
216216
``check_val_every_n_epoch`` > 1, validation is aligned to epoch multiples: if the interval elapses
217217
before the next multiple-N epoch, validation runs at the start of that epoch (after the first batch)
218218
and the timer resets; if it elapses during a multiple-N epoch, validation runs after the current batch.
219-
For ``None`` or ``1``, the time-based behavior of ``val_check_interval`` applies without additional
220-
alignment.
219+
For ``None`` or ``1`` cases, the time-based behavior of ``val_check_interval`` applies without
220+
additional alignment.
221221
Default: ``1``.
222222
223223
num_sanity_val_steps: Sanity check runs n validation batches before starting the training routine.

0 commit comments

Comments
 (0)