Skip to content

Commit c9168b6

Browse files
authored
Rename warmup_style to lr_scheduler_type (#479)
1 parent 530c877 commit c9168b6

File tree

18 files changed

+155
-43
lines changed

18 files changed

+155
-43
lines changed

benchmark/config/countdown-template.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ algorithm:
99
optimizer:
1010
lr: 1e-06
1111
lr_warmup_steps_ratio: 0.0
12-
warmup_style: constant
12+
lr_scheduler_type: constant
1313
advantage_fn: ppo
1414
data_processor: {}
1515
model:
@@ -78,7 +78,7 @@ trainer:
7878
optim:
7979
lr: 1e-5
8080
lr_warmup_steps_ratio: 0.0
81-
warmup_style: constant
81+
lr_scheduler_type: constant
8282
ppo_max_token_len_per_gpu: 12800
8383
forward_max_token_len_per_gpu: 12800
8484
cliprange_value: 0.5

benchmark/config/gsm8k-template.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ algorithm:
99
optimizer:
1010
lr: 1e-5
1111
lr_warmup_steps_ratio: 0.0
12-
warmup_style: constant
12+
lr_scheduler_type: constant
1313
sample_strategy: default
1414
policy_loss_fn: ppo
1515
advantage_fn: grpo

benchmark/config/guru_math-template.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ algorithm:
1616
lr: 1e-6
1717
weight_decay: 0.1
1818
lr_warmup_steps: 80
19-
warmup_style: constant
19+
lr_scheduler_type: constant
2020
cluster:
2121
node_num: 1
2222
gpu_per_node: 8

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,7 +112,7 @@ algorithm:
112112
- `optimizer`: Optimizer configuration for actor.
113113
- `lr`: Learning rate for actor.
114114
- `warmup_style`: Deprecated, use `lr_scheduler_type` instead. We will remove this field in future versions.
115-
- `lr_scheduler_type`: Learning rate scheduler type for actor model. Default is `constant`. Supported types: `constant`, `consine`.
115+
- `lr_scheduler_type`: Learning rate scheduler type for actor model. Default is `constant`. Supported types: `constant`, `cosine`.
116116
- `sample_strategy`: The sampling strategy used for loading experiences from experience buffer. Supported types: `default`, `staleness_control`, `mix`.
117117
- `advantage_fn`: The advantage function used for computing advantages.
118118
- `kl_penalty_fn`: The KL penalty function used for computing KL penalty applied in reward.

examples/dpo_human_in_the_loop/dpo.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ algorithm:
4242
lr: 5e-7
4343
lr_warmup_steps_ratio: 0.03 # the total steps will be injected during runtime
4444
min_lr_ratio: 0.1 # only useful for warmup with cosine
45-
warmup_style: cosine # select from constant/cosine
45+
lr_scheduler_type: cosine # select from constant/cosine
4646
betas: [0.9, 0.95]
4747
kl_loss_fn: k1
4848
kl_loss_fn_args:

examples/dpo_humanlike/dpo.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ algorithm:
77
lr: 5e-7
88
lr_warmup_steps_ratio: 0.03 # the total steps will be injected during runtime
99
min_lr_ratio: 0.1 # only useful for warmup with cosine
10-
warmup_style: cosine # select from constant/cosine
10+
lr_scheduler_type: cosine # select from constant/cosine
1111
betas: [0.9, 0.95]
1212
kl_loss_fn: k1
1313
kl_loss_fn_args:

examples/grpo_alfworld/alfworld.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ trainer:
6565
# optimizer:
6666
# lr: 5e-6
6767
# lr_warmup_steps_ratio: 0.0
68-
# warmup_style: constant
68+
# lr_scheduler_type: constant
6969
# buffer:
7070
# total_epochs: 1
7171
# train_batch_size: 32

examples/learn_to_ask/train.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ algorithm:
1313
optimizer:
1414
lr: 5.0e-07
1515
lr_warmup_steps_ratio: 0.0
16-
warmup_style: constant
16+
lr_scheduler_type: constant
1717
data_processor: {}
1818
model:
1919
model_path: ${oc.env:TRINITY_MODEL_PATH,Qwen/Qwen2.5-7B-Instruct}

examples/tinker/tinker.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ algorithm:
1111
optimizer:
1212
lr: 1.0e-05
1313
lr_warmup_steps_ratio: 0.0
14-
warmup_style: constant
14+
lr_scheduler_type: constant
1515
data_processor: {}
1616
model:
1717
model_path: Qwen/Qwen3-4B-Instruct-2507

scripts/context_length_test/context_length.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -16,7 +16,7 @@ algorithm:
1616
optimizer:
1717
lr: 1.0e-05
1818
lr_warmup_steps_ratio: 0.0
19-
warmup_style: constant
19+
lr_scheduler_type: constant
2020
data_processor: {}
2121
model:
2222
model_path: ${oc.env:MODEL_PATH,Qwen/Qwen3-0.6B}

0 commit comments

Comments
 (0)