Skip to content

Add stepwise scheduling for GradientAccumulationScheduler #21534

@lgienapp

Description

@lgienapp

Description & Motivation

Currently, the GradientAccumulationScheduler only supports scheduling on epoch intervals. However, during pretraining tasks, the model might only run for a single epoch. Therefore, it would be beneficial to be able to schedule the gradient accumulation according to trainer.global_step taken.

Proposal:

  • add a interval parameter to GradientAccumulationScheduler, which can be "epoch" or "step", defaulting to "epoch" for backwards compatibility
  • add a condition to the current on_train_epoch_start to only trigger if interval == "epoch"
  • add an on_train_batch_start/on_after_optimizer_step hook, triggering if interval == "step"

However, given the current warning of scheduling being incompatible with DeepSpeed, I am not sure if scheduling on steps would be unsupported by all/some strategies.

Pitch

I want to be able to scheduling gradient accumulation by trainer.global_step instead of trainer.current_epoch.

Alternatives

Additional context

Could depend on having an on_optimizer_step hook for callbacks. See #11688 (comment)

cc @lantiga

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions