|
3 | 3 | ## Table of Contents |
4 | 4 |
|
5 | 5 | - [sft Parameters](#sft-parameters) |
6 | | -- [dpo Parameters](#dpo-parameters) |
| 6 | +- [pt Parameters](#pt-parameters) |
| 7 | +- [rlhf Parameters](#rlhf-parameters) |
7 | 8 | - [infer merge-lora Parameters](#infer-merge-lora-parameters) |
8 | 9 | - [export Parameters](#export-parameters) |
9 | 10 | - [eval Parameters](#eval-parameters) |
|
66 | 67 | - `--lora_target_regex`: The lora target regex in `Optional[str]`. default is `None`. If this argument is specified, the `lora_target_modules` will have no effect. |
67 | 68 | - `--lora_rank`: Default is `8`. Only takes effect when `sft_type` is 'lora'. |
68 | 69 | - `--lora_alpha`: Default is `32`. Only takes effect when `sft_type` is 'lora'. |
69 | | -- `--lora_dropout_p`: Default is `0.05`, only takes effect when `sft_type` is 'lora'. |
| 70 | +- `--lora_dropout`: Default is `0.05`, only takes effect when `sft_type` is 'lora'. |
70 | 71 | - `--init_lora_weights`: Method to initialize LoRA weights, can be specified as `true`, `false`, `gaussian`, `pissa`, or `pissa_niter_[number of iters]`. Default value `true`. |
71 | 72 | - `--lora_bias_trainable`: Default is `'none'`, options: 'none', 'all'. Set to `'all'` to make all biases trainable. |
72 | 73 | - `--lora_modules_to_save`: Default is `[]`. If you want to train embedding, lm_head, or layer_norm, you can set this parameter, e.g. `--lora_modules_to_save EMBEDDING LN lm_head`. If passed `'EMBEDDING'`, Embedding layer will be added to `lora_modules_to_save`. If passed `'LN'`, `RMSNorm` and `LayerNorm` will be added to `lora_modules_to_save`. |
|
83 | 84 | - `--max_steps`: Max_steps for training, default is `-1`. If `max_steps >= 0`, this overrides `num_train_epochs`. |
84 | 85 | - `--optim`: Default is `'adamw_torch'`. |
85 | 86 | - `--adam_beta1`: Default is `0.9`. |
86 | | -- `--adam_beta2`: Default is `0.999`. |
| 87 | +- `--adam_beta2`: Default is `0.95`. |
87 | 88 | - `--adam_epsilon`: Default is `1e-8`. |
88 | 89 | - `--learning_rate`: Default is `None`, i.e. set to 1e-4 if `sft_type` is lora, set to 1e-5 if `sft_type` is full. |
89 | 90 | - `--weight_decay`: Default is `0.01`. |
90 | 91 | - `--gradient_accumulation_steps`: Gradient accumulation, default is `None`, set to `math.ceil(16 / self.batch_size / world_size)`. `total_batch_size = batch_size * gradient_accumulation_steps * world_size`. |
91 | | -- `--max_grad_norm`: Gradient clipping, default is `0.5`. |
| 92 | +- `--max_grad_norm`: Gradient clipping, default is `1`. |
92 | 93 | - `--predict_with_generate`: Whether to use generation for evaluation, default is `False`. If set to False, evaluate using `loss`. If set to True, evaluate using `ROUGE-L` and other metrics. Generative evaluation takes a long time, choose carefully. |
93 | 94 | - `--lr_scheduler_type`: Default is `'cosine'`, options: 'linear', 'cosine', 'constant', etc. |
94 | 95 | - `--warmup_ratio`: Proportion of warmup in total training steps, default is `0.05`. |
@@ -237,6 +238,15 @@ The following parameters take effect when `sft_type` is set to `ia3`. |
237 | 238 | - `--ia3_feedforward_modules`: Specify the Linear name of IA3's MLP, this name must be in `ia3_target_modules`. |
238 | 239 | - `--ia3_modules_to_save`: Additional modules participating in IA3 training. See meaning of `lora_modules_to_save`. |
239 | 240 |
|
| 241 | +## PT Parameters |
| 242 | + |
| 243 | +PT parameters inherit from the SFT parameters with some modifications to the default values: |
| 244 | + |
| 245 | +- `--sft_type`: Default value is `'full'`. |
| 246 | +- `--lora_target_modules`: Default value is `'ALL'`. |
| 247 | +- `--lazy_tokenize`: Default value is `True`. |
| 248 | +- `--eval_steps`: Default value is `500`. |
| 249 | + |
240 | 250 | ## RLHF Parameters |
241 | 251 | RLHF parameters are an extension of the sft parameters, with the addition of the following options: |
242 | 252 | - `--rlhf_type`: Choose the alignment algorithm, with options such as 'dpo', 'orpo', 'simpo', 'kto', 'cpo'. For training scripts with different algorithms, please refer to [document](./Human-Preference-Alignment-Training-Documentation.md) |
|
0 commit comments