Skip to content

Commit ae965d8

Browse files
committed
rename warmup_style to lr_scheduler_type
1 parent b317708 commit ae965d8

File tree

13 files changed

+18
-15
lines changed

13 files changed

+18
-15
lines changed

benchmark/config/countdown-template.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ algorithm:
99
optimizer:
1010
lr: 1e-06
1111
lr_warmup_steps_ratio: 0.0
12-
warmup_style: constant
12+
lr_scheduler_type: constant
1313
advantage_fn: ppo
1414
data_processor: {}
1515
model:
@@ -78,7 +78,7 @@ trainer:
7878
optim:
7979
lr: 1e-5
8080
lr_warmup_steps_ratio: 0.0
81-
warmup_style: constant
81+
lr_scheduler_type: constant
8282
ppo_max_token_len_per_gpu: 12800
8383
forward_max_token_len_per_gpu: 12800
8484
cliprange_value: 0.5

benchmark/config/gsm8k-template.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ algorithm:
99
optimizer:
1010
lr: 1e-5
1111
lr_warmup_steps_ratio: 0.0
12-
warmup_style: constant
12+
lr_scheduler_type: constant
1313
sample_strategy: default
1414
policy_loss_fn: ppo
1515
advantage_fn: grpo

benchmark/config/guru_math-template.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ algorithm:
1616
lr: 1e-6
1717
weight_decay: 0.1
1818
lr_warmup_steps: 80
19-
warmup_style: constant
19+
lr_scheduler_type: constant
2020
cluster:
2121
node_num: 1
2222
gpu_per_node: 8

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ algorithm:
9797
repeat_times: 8
9898
optimizer:
9999
lr: 1e-6
100-
warmup_style: "warmup"
100+
lr_scheduler_type: "constant"
101101
# The following parameters are optional
102102
# If not specified, they will automatically be set based on the `algorithm_type`
103103
sample_strategy: "default"
@@ -111,7 +111,8 @@ algorithm:
111111
- `repeat_times`: Number of times each task is repeated. Default is `1`. In `dpo`, this is automatically set to `2`. Some algorithms such as GRPO and OPMD require `repeat_times` > 1.
112112
- `optimizer`: Optimizer configuration for actor.
113113
- `lr`: Learning rate for actor.
114-
- `warmup_style`: Warmup style for actor's learning rate.
114+
- `warmup_style`: Deprecated, use `lr_scheduler_type` instead.
115+
- `lr_scheduler_type`: Learning rate scheduler type for actor model. Default is `constant`. Supported types: `constant`, `consine`.
115116
- `sample_strategy`: The sampling strategy used for loading experiences from experience buffer. Supported types: `default`, `staleness_control`, `mix`.
116117
- `advantage_fn`: The advantage function used for computing advantages.
117118
- `kl_penalty_fn`: The KL penalty function used for computing KL penalty applied in reward.

docs/sphinx_doc/source_zh/tutorial/trinity_configs.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ algorithm:
9797
repeat_times: 8
9898
optimizer:
9999
lr: 1e-6
100-
warmup_style: constant
100+
lr_scheduler_type: constant
101101
# 以下参数为可选
102102
# 若未指定,将根据 `algorithm_type` 自动设置
103103
sample_strategy: "default"
@@ -111,7 +111,8 @@ algorithm:
111111
- `repeat_times`: 每个任务重复的次数。默认为 `1`。在 `dpo` 中自动设为 `2`。某些算法如 GRPO 和 OPMD 要求 `repeat_times` > 1。
112112
- `optimizer`: Actor 优化器的参数。
113113
- `lr`: 优化器的学习率。
114-
- `warmup_style`: 学习率的预热策略。
114+
- `warmup_style`:已弃用,请改用 `lr_scheduler_type`。
115+
- `lr_scheduler_type`:Actor 模型的学习率调度器类型。默认值为 `constant`。支持类型:`constant`、`cosine`。
115116
- `sample_strategy`: 从 experience buffer 加载 experience 时使用的采样策略。支持类型:`default`、`staleness_control`、`mix`。
116117
- `advantage_fn`: 用于计算优势值的函数。
117118
- `kl_penalty_fn`: 用于在奖励中计算 KL 惩罚的函数。

examples/dpo_human_in_the_loop/dpo.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ algorithm:
4242
lr: 5e-7
4343
lr_warmup_steps_ratio: 0.03 # the total steps will be injected during runtime
4444
min_lr_ratio: 0.1 # only useful for warmup with cosine
45-
warmup_style: cosine # select from constant/cosine
45+
lr_scheduler_type: cosine # select from constant/cosine
4646
betas: [0.9, 0.95]
4747
kl_loss_fn: k1
4848
kl_loss_fn_args:

examples/dpo_humanlike/dpo.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ algorithm:
77
lr: 5e-7
88
lr_warmup_steps_ratio: 0.03 # the total steps will be injected during runtime
99
min_lr_ratio: 0.1 # only useful for warmup with cosine
10-
warmup_style: cosine # select from constant/cosine
10+
lr_scheduler_type: cosine # select from constant/cosine
1111
betas: [0.9, 0.95]
1212
kl_loss_fn: k1
1313
kl_loss_fn_args:

examples/grpo_alfworld/alfworld.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ trainer:
6565
# optimizer:
6666
# lr: 5e-6
6767
# lr_warmup_steps_ratio: 0.0
68-
# warmup_style: constant
68+
# lr_scheduler_type: constant
6969
# buffer:
7070
# total_epochs: 1
7171
# train_batch_size: 32

examples/learn_to_ask/train.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ algorithm:
1313
optimizer:
1414
lr: 5.0e-07
1515
lr_warmup_steps_ratio: 0.0
16-
warmup_style: constant
16+
lr_scheduler_type: constant
1717
data_processor: {}
1818
model:
1919
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-7B-Instruct}

examples/tinker/tinker.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ algorithm:
1111
optimizer:
1212
lr: 1.0e-05
1313
lr_warmup_steps_ratio: 0.0
14-
warmup_style: constant
14+
lr_scheduler_type: constant
1515
data_processor: {}
1616
model:
1717
model_path: Qwen/Qwen3-4B-Instruct-2507

0 commit comments

Comments
 (0)