You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Command-line-parameters.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -112,7 +112,8 @@ Refer to the [generation_config](https://huggingface.co/docs/transformers/main_c
112
112
- top_p: The top_p parameter, defaults to None. It is read from generation_config.json.
113
113
- repetition_penalty: The repetition penalty. Defaults to None and is read from generation_config.json.
114
114
- num_beams: The number of beams reserved for parallel beam search, default is 1.
115
-
- 🔥stream: Stream output, default is `False`.
115
+
- 🔥stream: Streaming output. Default is `None`, which means it is set to True when using the interactive interface and False during batch inference on datasets.
116
+
- For "ms-swift<3.6", the default value of stream is False.
116
117
- stop_words: Additional stop words beyond eos_token, default is`[]`.
117
118
- Note: eos_token will be removed in the output response, whereas additional stop words will be retained in the output.
118
119
- logprobs: Whether to output logprobs, default is False.
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Megatron-SWIFT-Training.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,8 @@ The training module in the dependent library Megatron-LM will be cloned and inst
40
40
This section introduces a quick start example for fine-tuning the self-awareness of the Qwen2.5-7B-Instruct model using two 80GiB A100 GPUs. The following best practices can be completed within 10 minutes.
41
41
42
42
First, we need to convert the weights from HF (Hugging Face) format to Megatron format:
43
-
- If OOM occurs, simply remove `CUDA_VISIBLE_DEVICES=0`.
43
+
- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
44
+
- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
44
45
```shell
45
46
CUDA_VISIBLE_DEVICES=0 \
46
47
swift export \
@@ -88,6 +89,8 @@ megatron sft \
88
89
89
90
Finally, convert the Megatron format weights back to HF format:
90
91
- Note: Please point `--mcore_model` to the parent directory of `iter_xxx`. By default, the corresponding checkpoint from `latest_checkpointed_iteration.txt` will be used.
92
+
- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
93
+
- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
0 commit comments