You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Command-line-parameters.md
+2Lines changed: 2 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -222,6 +222,8 @@ This list inherits from the Transformers `Seq2SeqTrainingArguments`, with ms-swi
222
222
- router_aux_loss_coef: Used in MoE model training to set the weight of auxiliary loss. Default is `0.`.
223
223
- enable_dft_loss: Whether to use [DFT](https://arxiv.org/abs/2508.05629) (Dynamic Fine-Tuning) loss during SFT training. Default is `False`.
224
224
- enable_channel_loss: Enable channel-based loss. Default is `False`. Requires a `"channel"` field in the dataset. ms-swift groups and computes loss by this field (samples without `"channel"` are grouped into the default `None` channel). Dataset format reference: [channel loss](../Customization/Custom-dataset.md#channel-loss). Channel loss is compatible with packing, padding_free, and loss_scale techniques.
225
+
- safe_serialization: Whether to save the model in safetensors format. Default is True.
226
+
- max_shard_size: Maximum size of a single storage file, default is '5GB'.
225
227
- logging_dir: Directory for TensorBoard logs. Default is `None`, automatically set to `f'{self.output_dir}/runs'`.
226
228
- predict_with_generate: Use generation during evaluation. Default is `False`.
227
229
- metric_for_best_model: Default is `None`. If `predict_with_generate=False`, it's set to `'loss'`; otherwise `'rouge-l'` (in PPO training, no default; in GRPO, set to `'reward'`).
0 commit comments