accumulate_grad_batches argument changes trainer.global_step #9179
Replies: 4 comments 3 replies
-
Maybe I've met the same problem, would you like to share your callbacks? |
Beta Was this translation helpful? Give feedback.
-
Not sure if this is related but I am running some experiments where a 10x grad accumulation leads to a 10x slowdown (as shown on tensorboard logs), which could be due to a change in the way steps are counted rather than an actual slowdown. |
Beta Was this translation helpful? Give feedback.
-
I've run into the same issue, does anyone have a nice way of handling this? |
Beta Was this translation helpful? Give feedback.
-
@duskvirkus I think that this is expected behavior in the sense that "global_steps" means how many optimizer steps happened so far |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Not sure if this a bug but came across the side effect that using the
accumulate_grad_batches
changes the waytrainer.global_step
is counted.Came across when trying to debug a custom callback that uses trainer.global_step to keep track of training progress. If there's a "better" way to go about this then let me know.
Not that hard to fix the problem now that I know what's happening but I figured I'd bring it up in case I'm going about things wrong or it's a bug.
toy example:
without accumulate_grad_batches
with
Beta Was this translation helpful? Give feedback.
All reactions