Skip to content

Support WSD learning rate schedule#82

Open
sfc-gh-lmerrick wants to merge 3 commits intomainfrom
lmerrick-support-wsd-lr-schedule
Open

Support WSD learning rate schedule#82
sfc-gh-lmerrick wants to merge 3 commits intomainfrom
lmerrick-support-wsd-lr-schedule

Conversation

@sfc-gh-lmerrick
Copy link
Contributor

Warmup-stable-decay (WSD) is a common learning rate schedule which is parameterized slightly differently than the default linear decay schedule (canonically by step count rather than fraction of training steps, at least in transformers). This PR adds support for WSD to the scheduler factory, moving the warmup ratio out of the base scheduler config and into the HFSchedulerConfig class.

It may also make sense to rename HFScheduler* configs and classes because in truth the config and factory are specific to linear schedules and do not have the right parameters to support other schedules like WSD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant