Skip to content

Commit 4e4b794

Browse files
authored
support warmup_stable_decay (#1312)
1 parent 4a96f35 commit 4e4b794

File tree

3 files changed

+4
-0
lines changed

3 files changed

+4
-0
lines changed

docs/source/LLM/命令行参数.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@
8787
- `--predict_with_generate`: 评估时是否使用生成式的方式, 默认为`False`. 如果设置为False, 则使用`loss`进行评估. 如果设置为True, 则使用`ROUGE-L`等指标进行评估. 使用生成式评估耗费的时间很长, 请谨慎选择.
8888
- `--lr_scheduler_type`: 默认值为`'cosine'`, 你可以选择: 'linear', 'cosine', 'constant'等.
8989
- `--warmup_ratio`: warmup占用总的训练steps的比例, 默认为`0.05`.
90+
- `--warmup_steps`: warmup的步数, 默认为`0`. 如果设置`warmup_steps>0`, 则覆盖warmup_ratio.
9091
- `--eval_steps`: 每训练多少steps进行评估, 默认为`50`.
9192
- `--save_steps`: 每训练多少个steps进行保存, 默认为`None`, 即设置为`eval_steps`.
9293
- `--save_only_model`: 是否只保存模型参数, 而不存储断点续训所需的中间状态, 默认为`None`, 即如果`sft_type`为'lora'并且不使用deepspeed(`deepspeed``None`), 设置为False, 否则设置为True(e.g. 使用了全参数微调或者使用了deepspeed).

docs/source_en/LLM/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,7 @@
8888
- `--predict_with_generate`: Whether to use generation for evaluation, default is `False`. If set to False, evaluate using `loss`. If set to True, evaluate using `ROUGE-L` and other metrics. Generative evaluation takes a long time, choose carefully.
8989
- `--lr_scheduler_type`: Default is `'cosine'`, options: 'linear', 'cosine', 'constant', etc.
9090
- `--warmup_ratio`: Proportion of warmup in total training steps, default is `0.05`.
91+
- `--warmup_steps`: The number of warmup steps, default is `0`. If warmup_steps > 0 is set, it overrides warmup_ratio.
9192
- `--eval_steps`: Evaluate every this many steps, default is `50`.
9293
- `--save_steps`: Save every this many steps, default is `None`, i.e. set to `eval_steps`.
9394
- `--save_only_model`: Whether to save only model parameters, without saving intermediate states needed for checkpoint resuming, default is `None`, i.e. if `sft_type` is 'lora' and not using deepspeed (`deepspeed` is `None`), set to False, otherwise set to True (e.g. using full fine-tuning or deepspeed).

swift/llm/utils/argument.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -581,6 +581,7 @@ class SftArguments(ArgumentsBase):
581581
lr_scheduler_type: str = 'cosine'
582582
lr_scheduler_kwargs: Optional[str] = None # json
583583
warmup_ratio: float = 0.05
584+
warmup_steps: int = 0 # Overrides any effect of `warmup_ratio` if warmup_steps > 0
584585

585586
eval_steps: int = 50
586587
save_steps: Optional[int] = None
@@ -984,6 +985,7 @@ def _init_training_args(self) -> None:
984985
lr_scheduler_type=self.lr_scheduler_type,
985986
lr_scheduler_kwargs=self.lr_scheduler_kwargs,
986987
warmup_ratio=self.warmup_ratio,
988+
warmup_steps=self.warmup_steps,
987989
logging_steps=self.logging_steps,
988990
save_strategy=self.save_strategy,
989991
save_steps=self.save_steps,

0 commit comments

Comments
 (0)