You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source_en/Instruction/Megatron-SWIFT-Training.md
+4-1Lines changed: 4 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -40,7 +40,8 @@ The training module in the dependent library Megatron-LM will be cloned and inst
40
40
This section introduces a quick start example for fine-tuning the self-awareness of the Qwen2.5-7B-Instruct model using two 80GiB A100 GPUs. The following best practices can be completed within 10 minutes.
41
41
42
42
First, we need to convert the weights from HF (Hugging Face) format to Megatron format:
43
-
- If OOM occurs, simply remove `CUDA_VISIBLE_DEVICES=0`.
43
+
- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
44
+
- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
44
45
```shell
45
46
CUDA_VISIBLE_DEVICES=0 \
46
47
swift export \
@@ -88,6 +89,8 @@ megatron sft \
88
89
89
90
Finally, convert the Megatron format weights back to HF format:
90
91
- Note: Please point `--mcore_model` to the parent directory of `iter_xxx`. By default, the corresponding checkpoint from `latest_checkpointed_iteration.txt` will be used.
92
+
- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
93
+
- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
0 commit comments