Skip to content

[feature request] Metric Logging frequency parameter #395

@felipemello1

Description

@felipemello1

TLDR: Priority would be to provide data on perf hit caused by metric logging, perhaps by setting DISABLE_FORGE_METRICS=true to confirm that there is any significant slow down that would motivate lower frequency. Otherwise, the complexity is probably not worth it.


Currently we call mlogger.flush on every training step. User may have reasons to flush every N steps instead.

a) It is unclear if we should do it, i.e. is logging on every step objectively slow or just a few ms?
b) If we enable logging frequency, how is it done?

  • Per backend, e.g. console every 10 steps, but wandb every 1 step? Or 1 frequency for every backend?
    c) "per_rank_no_reduce", aka streaming, wouldnt work with this mode, since it logs as soon as record_metric is called
    d) If we call flush every 10 steps, it means that the metrics will be reduced only after 10 steps. Two runs wont be comparable if they have different logging frequency.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions