You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- `trainer.trainer_type`: The backend of the trainer, Only `verl` is supported.
195
-
- `trainer.trainer_config_path`: The path to the trainer configuration file. It must be set manually.
196
195
- `trainer.save_interval`: The interval steps between two checkpoints. Default is `100`.
197
196
197
+
- `trainer.actor_grad_clip`: Gradient clip for actor model training.
198
+
- `trainer.actor_clip_ratio`: Used for compute policy loss.
199
+
- `trainer.actor_entropy_coeff`: Used for compute policy loss.
200
+
- `trainer.actor_use_kl_loss`: Whether to enable kl loss.
201
+
- `trainer.actor_kl_loss_coef`: The coefficient of kl loss.
202
+
203
+
- `trainer.train_config`: The configuration of the trainer. Only one needs to be set for `trainer.trainer_config` and `trainer.trainer_config_path`
204
+
- `trainer.trainer_config_path`: The path to the trainer configuration file. It must be set manually.
205
+
198
206
### veRL Trainer Configuration
199
207
200
208
Here we mainly introduce the parameters that can be set in veRL. For the specific meaning of the parameters, please refer to the official document of [veRL](https://github.com/volcengine/verl/blob/0bdf7f469854815177e73dcfe9e420836c952e6e/docs/examples/config.rst).
201
209
202
210
```yaml
203
-
data:
204
-
tokenizer: null
205
-
train_files: train_example.parquet
206
-
val_files: test_example.parquet
207
-
prompt_key: prompt
208
-
max_prompt_length: 256
209
-
max_response_length: 1024
210
-
train_batch_size: 256
211
-
val_batch_size: null
212
-
return_raw_input_ids: False # This should be set to true when the tokenizer between policy and rm differs
213
-
return_raw_chat: False
214
-
shuffle: True
215
-
filter_overlong_prompts: False # for large-scale dataset, filtering overlong prompts could be timeconsuming. You should disable this and set `truncation='left'
- `actor_rollout_ref.model.use_remove_padding`: Whether to remove pad tokens, which will reduce training time.
403
331
- `actor_rollout_ref.actor.use_dynamic_bsz`: Whether to reorganize the batch data, specifically to splice the shorter data to reduce the batch size in the actual training process.
404
332
- `actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu`: Batch size for one GPU in one forward pass.
405
-
- `actor_rollout_ref.actor.grad_clip`: Gradient clip for actor model training.
406
-
- `actor_rollout_ref.actor.clip_ratio`: Used for compute policy loss.
407
-
- `actor_rollout_ref.actor.entropy_coeff`: Used for compute policy loss.
408
-
- `actor_rollout_ref.actor.use_kl_loss`: Whether to enable kl loss.
409
-
- `actor_rollout_ref.actor.kl_loss_coef`: The coefficient of kl loss.
410
333
- `actor_rollout_ref.actor.kl_loss_type`: How to compute kl loss, optional value is `kl`, `abs`, `mse` or `low_var_kl`.
return_raw_input_ids: False # This should be set to true when the tokenizer between policy and rm differs
11
-
return_raw_chat: False
12
-
shuffle: True
13
-
filter_overlong_prompts: False # for large-scale dataset, filtering overlong prompts could be timeconsuming. You should disable this and set `truncation='left'
14
-
truncation: error
15
-
image_key: images
16
-
17
1
actor_rollout_ref:
18
2
hybrid_engine: True
19
3
model:
20
-
path: /PATH/TO/MODEL/
21
4
external_lib: null
22
5
override_config: { }
23
6
enable_gradient_checkpointing: True
24
7
use_remove_padding: True # False
25
8
actor:
26
9
strategy: fsdp # This is for backward-compatibility
27
10
ppo_mini_batch_size: 128
28
-
# ppo_micro_batch_size: 8 # will be deprecated, use ppo_micro_batch_size_per_gpu
29
11
ppo_micro_batch_size_per_gpu: 4
30
12
use_dynamic_bsz: True # False
31
13
ppo_max_token_len_per_gpu: 16384# n * ${data.max_prompt_length} + ${data.max_response_length}
@@ -61,92 +43,10 @@ actor_rollout_ref:
61
43
wrap_policy:
62
44
# transformer_layer_cls_to_wrap: None
63
45
min_num_params: 0
64
-
# log_prob_micro_batch_size: 4 # will be deprecated, use log_prob_micro_batch_size_per_gpu
0 commit comments