Skip to content

Conversation

calvinpelletier
Copy link
Contributor

@calvinpelletier calvinpelletier commented Aug 19, 2025

Build metric logger:

metric_logger = get_metric_logger(**config.metrics)

Log a metric:

metric_logger.log("name", value, step)

If you want to check if this is a log step before calculating the metric:

if metric_logger.is_log_step(name, step):
    value = calculate_metric()
    metric_logger.log(name, value, step)

Open questions:

  • multiple loggers? e.g. both wandb and stdout
  • non-constant log frequencies? (e.g. for metrics that take a long time to calculate, I like to use a logarithmic frequency so that i get quick data points early in training and fewer points as training becomes more stable)

Follow up PRs:

  • put metric logger inside an actor

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 19, 2025
@calvinpelletier calvinpelletier changed the title Metric Logging [WIP] Metric Logging Aug 19, 2025
@calvinpelletier calvinpelletier changed the title [WIP] Metric Logging Metric Logging Aug 19, 2025
Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more small comments, after that I think this is good!

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@calvinpelletier bumping this PR. Looks like there are some merge conflicts. Let's resolve those and address the remaining comments, then we should be good to land

@calvinpelletier
Copy link
Contributor Author

@ebsmothers addressed the comments. I'll leave the config class for the next PR after adding metric logging for sft_v2 and toy_rl

Copy link
Contributor

@ebsmothers ebsmothers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two more nits, otherwise looks good!

Example:
>>> logger = get_logger("INFO")
>>> logger.info("Hello world!")
INFO:torchtune.utils._logging:Hello world!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
INFO:torchtune.utils._logging:Hello world!
INFO:forge.util.logging:Hello world!

freq (Mapping[str, int]):
calls to `log` and `log_dict` will be ignored if `step % freq[metric_name] != 0`
log_dir (Optional[str]): WandB log directory.
project (str): WandB project name. Default is `torchtune`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
project (str): WandB project name. Default is `torchtune`.
project (str): WandB project name.

@ebsmothers ebsmothers mentioned this pull request Aug 26, 2025
"""
rank = dist.get_rank() if dist.is_available() and dist.is_initialized() else 0
if rank != 0:
return
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't log rank be configurable like how titan did? But I guess we can always improve later.

@pbontrager pbontrager merged commit f8d0446 into main Aug 26, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants