Skip to content

Misleading docs or impl for log_metrics #20677

@clumsy

Description

@clumsy

Bug description

The docs currently say:

            step: Step for which metrics should be logged. Default value is `self.global_step` during training or
                the total validation / test log step count during validation and testing.

But during training if someone is to do LightningModule.log("step", 42.0) it would be placed into metrics and used before we fallback to trainer.global_step a few lines below.

E.g. NeMo does log this metric unconditionally: https://github.com/NVIDIA/NeMo/blob/fae8897686d7460381b016de005b44ff27bd1092/nemo/lightning/pytorch/strategies/megatron_strategy.py#L675

Either the docs need to be fixed, or we need to only fallback to metrics for step when not training. If we do fallback - we need to ensure step is an int (convertible from float w/o loss).

Happy to contribute should we figure out which route to go.

What version are you seeing the problem on?

master

How to reproduce the bug

`LightningModule.log("step", 1.0)`, then `LightningModule.log("metric", 42.0)` and then observe `<Metric: key='metric', step=1.0, timestamp=1743030659681, value=42.0>`

Note that this makes the step a `float` vs. `int`.

Error messages and logs

# Error messages and logs here please

Environment

Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):

More info

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingneeds triageWaiting to be triaged by maintainersver: 2.5.x

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions