-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Closed
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x
Description
Bug description
The docs currently say:
step: Step for which metrics should be logged. Default value is `self.global_step` during training or
the total validation / test log step count during validation and testing.
But during training if someone is to do LightningModule.log("step", 42.0)
it would be placed into metrics and used before we fallback to trainer.global_step
a few lines below.
E.g. NeMo does log this metric unconditionally: https://github.com/NVIDIA/NeMo/blob/fae8897686d7460381b016de005b44ff27bd1092/nemo/lightning/pytorch/strategies/megatron_strategy.py#L675
Either the docs need to be fixed, or we need to only fallback to metrics for step when not training. If we do fallback - we need to ensure step
is an int (convertible from float w/o loss).
Happy to contribute should we figure out which route to go.
What version are you seeing the problem on?
master
How to reproduce the bug
`LightningModule.log("step", 1.0)`, then `LightningModule.log("metric", 42.0)` and then observe `<Metric: key='metric', step=1.0, timestamp=1743030659681, value=42.0>`
Note that this makes the step a `float` vs. `int`.
Error messages and logs
# Error messages and logs here please
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.5.0):
#- PyTorch Version (e.g., 2.5):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingneeds triageWaiting to be triaged by maintainersWaiting to be triaged by maintainersver: 2.5.x