-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Open
Labels
callback: gradient accumulationfeatureIs an improvement or enhancementIs an improvement or enhancement
Description
Description & Motivation
Currently, the GradientAccumulationScheduler only supports scheduling on epoch intervals. However, during pretraining tasks, the model might only run for a single epoch. Therefore, it would be beneficial to be able to schedule the gradient accumulation according to trainer.global_step taken.
Proposal:
- add a
intervalparameter to GradientAccumulationScheduler, which can be"epoch"or"step", defaulting to"epoch"for backwards compatibility - add a condition to the current
on_train_epoch_startto only trigger ifinterval == "epoch" - add an
on_train_batch_start/on_after_optimizer_stephook, triggering ifinterval == "step"
However, given the current warning of scheduling being incompatible with DeepSpeed, I am not sure if scheduling on steps would be unsupported by all/some strategies.
Pitch
I want to be able to scheduling gradient accumulation by trainer.global_step instead of trainer.current_epoch.
Alternatives
Additional context
Could depend on having an on_optimizer_step hook for callbacks. See #11688 (comment)
cc @lantiga
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
callback: gradient accumulationfeatureIs an improvement or enhancementIs an improvement or enhancement