Skip to content

Commit 949dbfc

Browse files
committed
bug fix in config_mangager.py
1 parent 222c2bf commit 949dbfc

File tree

14 files changed

+40
-36
lines changed

14 files changed

+40
-36
lines changed

docs/sphinx_doc/source/tutorial/trinity_configs.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -375,7 +375,7 @@ trainer:
375375
save_freq: 100
376376
# auto: find the last ckpt to resume. If can't find, start from scratch
377377
resume_mode: auto # or auto or resume_path if
378-
resume_from_path: False
378+
resume_from_path: ""
379379
test_freq: 100
380380
critic_warmup: 0
381381
default_hdfs_dir: null
@@ -395,8 +395,9 @@ trainer:
395395
- `actor_rollout_ref.actor.grad_clip`: Gradient clip for actor model training.
396396
- `actor_rollout_ref.actor.clip_ratio`: Used for compute policy loss.
397397
- `actor_rollout_ref.actor.entropy_coeff`: Used for compute policy loss.
398-
- `actor_rollout_ref.actor.use_kl_loss`: True for GRPO.
399-
- `actor_rollout_ref.actor.kl_loss_coef`: Used for GRPO, optional value is `kl`, `abs`, `mse` or `low_var_kl`.
398+
- `actor_rollout_ref.actor.use_kl_loss`: Whether to enable kl loss.
399+
- `actor_rollout_ref.actor.kl_loss_coef`: The coefficient of kl loss.
400+
- `actor_rollout_ref.actor.kl_loss_type`: How to compute kl loss, optional value is `kl`, `abs`, `mse` or `low_var_kl`.
400401
- `actor_rollout_ref.actor.ulysses_sequence_parallel_size`: Ulysses sequence parallel size.
401402
- `actor_rollout_ref.actor.alg_type`: Used for OPMD, optional value is `ppo`, `opmd` or `pairwise_opmd`.
402403
- `actor_rollout_ref.actor.tau`: strength of regularization w.r.t. old / ref policy.

examples/dpo_humanlike/train_dpo.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,6 @@ trainer:
173173
save_freq: 30
174174
# auto: find the last ckpt to resume. If can't find, start from scratch
175175
resume_mode: auto # or auto or resume_path if
176-
resume_from_path: False
177176
test_freq: 5
178177
critic_warmup: 0
179178
default_hdfs_dir: null

examples/grpo_alfworld/train_alfworld.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,6 @@ trainer:
172172
save_freq: 1
173173
# auto: find the last ckpt to resume. If can't find, start from scratch
174174
resume_mode: auto # or auto or resume_path if
175-
resume_from_path: False
176175
test_freq: 100
177176
critic_warmup: 0
178177
default_hdfs_dir: null

examples/grpo_gsm8k/gsm8k.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,6 @@ buffer:
3535
train_dataset:
3636
name: gsm8k_buffer
3737
storage_type: queue
38-
algorithm_type: ppo
3938
path: 'sqlite:///gsm8k.db'
4039
# sft_warmup_dataset: # Uncomment these to enable sft warmup
4140
# name: warmup_data

examples/grpo_gsm8k/train_gsm8k.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -177,7 +177,6 @@ trainer:
177177
save_freq: 100
178178
# auto: find the last ckpt to resume. If can't find, start from scratch
179179
resume_mode: auto # or auto or resume_path if
180-
resume_from_path: False
181180
test_freq: 5
182181
critic_warmup: 0
183182
default_hdfs_dir: null

examples/grpo_math/math.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,6 @@ buffer:
2727
train_dataset:
2828
name: math_buffer
2929
storage_type: queue
30-
algorithm_type: ppo
3130
path: 'sqlite:////math.db'
3231
explorer:
3332
engine_type: vllm_async

examples/grpo_math/train_math.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,6 @@ trainer:
169169
save_freq: 100
170170
# auto: find the last ckpt to resume. If can't find, start from scratch
171171
resume_mode: auto # or auto or resume_path if
172-
resume_from_path: False
173172
test_freq: 5
174173
critic_warmup: 0
175174
default_hdfs_dir: null

examples/grpo_sciworld/sciworld.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@ buffer:
2121
train_dataset:
2222
name: sciworld_buffer
2323
storage_type: queue
24-
algorithm_type: ppo
2524
path: 'sqlite:///sciworld.db'
2625
explorer:
2726
engine_type: vllm_async

examples/grpo_sciworld/train_sciworld.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -167,7 +167,6 @@ trainer:
167167
save_freq: 1
168168
# auto: find the last ckpt to resume. If can't find, start from scratch
169169
resume_mode: auto # or auto or resume_path if
170-
resume_from_path: False
171170
test_freq: 100
172171
critic_warmup: 0
173172
default_hdfs_dir: null

examples/grpo_webshop/train_webshop.yaml

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -172,7 +172,6 @@ trainer:
172172
save_freq: 1
173173
# auto: find the last ckpt to resume. If can't find, start from scratch
174174
resume_mode: auto # or auto or resume_path if
175-
resume_from_path: False
176175
test_freq: 100
177176
critic_warmup: 0
178177
default_hdfs_dir: null

0 commit comments

Comments
 (0)