-
Notifications
You must be signed in to change notification settings - Fork 16
Open
Labels
enhancementNew feature or requestNew feature or request
Description
TLDR: Priority would be to provide data on perf hit caused by metric logging, perhaps by setting DISABLE_FORGE_METRICS=true to confirm that there is any significant slow down that would motivate lower frequency. Otherwise, the complexity is probably not worth it.
Currently we call mlogger.flush on every training step. User may have reasons to flush every N steps instead.
a) It is unclear if we should do it, i.e. is logging on every step objectively slow or just a few ms?
b) If we enable logging frequency, how is it done?
- Per backend, e.g. console every 10 steps, but wandb every 1 step? Or 1 frequency for every backend?
c) "per_rank_no_reduce", aka streaming, wouldnt work with this mode, since it logs as soon as record_metric is called
d) If we call flush every 10 steps, it means that the metrics will be reduced only after 10 steps. Two runs wont be comparable if they have different logging frequency.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request