[feature request] Metric Logging frequency parameter

TLDR: Priority would be to provide data on perf hit caused by metric logging, perhaps by setting DISABLE_FORGE_METRICS=true to confirm that there is any significant slow down that would motivate lower frequency. Otherwise, the complexity is probably not worth it.

---

Currently we call mlogger.flush on every training step. User may have reasons to flush every N steps instead. 

a) It is unclear if we should do it, i.e. is logging on every step objectively slow or just a few ms?
b) If we enable logging frequency, how is it done?
- Per backend, e.g. console every 10 steps, but wandb every 1 step? Or 1 frequency for every backend?
c) "per_rank_no_reduce", aka streaming, wouldnt work with this mode, since it logs as soon as record_metric is called
d) If we call flush every 10 steps, it means that the metrics will be reduced only after 10 steps. Two runs wont be comparable if they have different logging frequency.




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[feature request] Metric Logging frequency parameter #395

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[feature request] Metric Logging frequency parameter #395

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions