You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/LLM/Command-line-parameters.md
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -157,6 +157,7 @@ The following parameters take effect when `sft_type` is set to `ia3`.
157
157
dpo parameters inherit from sft parameters, with the following added parameters:
158
158
159
159
-`--ref_model_type`: Type of reference model, available `model_type` options can be found in `MODEL_MAPPING.keys()`.
160
+
-`--ref_model_id_or_path`: The local cache dir for reference model, default `None`.
160
161
-`--max_prompt_length`: Maximum prompt length, this parameter is passed to DPOTrainer, setting prompt length to not exceed this value, default is `1024`.
161
162
-`--beta`: Regularization term for DPO logits, default is 0.1.
162
163
-`--label_smoothing`: Whether to use DPO smoothing, default is 0, generally set between 0~0.5.
0 commit comments