[megatron] Fix the display issue for train_type=lora (#4845)

Jintao-Huang · web-flow · commit e9f9e086a22f · 2025-07-07T14:19:59.000+08:00
diff --git a/docs/source/Instruction/Megatron-SWIFT训练.md b/docs/source/Instruction/Megatron-SWIFT训练.md
@@ -40,6 +40,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
 
 首先，我们需要将HF格式的权重转为Megatron格式：
 - 若出现OOM，将`CUDA_VISIBLE_DEVICES=0`删除即可。
+- "ms-swift>=3.6"推荐增加`--test_convert_precision true`参数测试转换精度。
 ```shell
 CUDA_VISIBLE_DEVICES=0 \
 swift export \
@@ -87,7 +88,8 @@ megatron sft \
 
 最后，将Megatron格式权重转为HF格式：
 - 注意：`--mcore_model`请指向`iter_xxx`的上级目录。默认会使用`latest_checkpointed_iteration.txt`中对应的checkpoint。
-
+- 若出现OOM，将`CUDA_VISIBLE_DEVICES=0`删除即可。
+- "ms-swift>=3.6"推荐增加`--test_convert_precision true`参数测试转换精度。
 ```shell
 CUDA_VISIBLE_DEVICES=0 \
 swift export \
diff --git a/docs/source_en/Instruction/Megatron-SWIFT-Training.md b/docs/source_en/Instruction/Megatron-SWIFT-Training.md
@@ -40,7 +40,8 @@ The training module in the dependent library Megatron-LM will be cloned and inst
 This section introduces a quick start example for fine-tuning the self-awareness of the Qwen2.5-7B-Instruct model using two 80GiB A100 GPUs. The following best practices can be completed within 10 minutes.
 
 First, we need to convert the weights from HF (Hugging Face) format to Megatron format:
-- If OOM occurs, simply remove `CUDA_VISIBLE_DEVICES=0`.
+- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
+- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
 ```shell
 CUDA_VISIBLE_DEVICES=0 \
 swift export \
@@ -88,6 +89,8 @@ megatron sft \
 
 Finally, convert the Megatron format weights back to HF format:
 - Note: Please point `--mcore_model` to the parent directory of `iter_xxx`. By default, the corresponding checkpoint from `latest_checkpointed_iteration.txt` will be used.
+- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
+- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
 
 ```shell
 CUDA_VISIBLE_DEVICES=0 \
diff --git a/swift/llm/argument/base_args/base_args.py b/swift/llm/argument/base_args/base_args.py
@@ -224,6 +224,8 @@ def load_args_from_ckpt(self) -> None:
             'bnb_4bit_quant_type',
             'bnb_4bit_use_double_quant',
         ]
+        if 'megatron' in self.__class__.__name__.lower():
+            force_load_keys = []
         # If the current value is None or an empty list and it is among the following keys
         load_keys = [
             'custom_register_path',
diff --git a/tests/megatron/test_train.py b/tests/megatron/test_train.py
@@ -18,6 +18,7 @@ def test_sft():
             model_author='swift',
             model_name='swift-robot',
             eval_iters=5,
+            sequence_parallel=True,
             finetune=True))
 
 
@@ -35,5 +36,5 @@ def test_pt():
 
 
 if __name__ == '__main__':
-    # test_sft()
-    test_pt()
+    test_sft()
+    # test_pt()

Original file line number	Diff line number	Diff line change
`@@ -224,6 +224,8 @@ def load_args_from_ckpt(self) -> None:`
`224`	`224`	`'bnb_4bit_quant_type',`
`225`	`225`	`'bnb_4bit_use_double_quant',`
`226`	`226`	`]`
	`227`	`+ if 'megatron' in self.__class__.__name__.lower():`
	`228`	`+ force_load_keys = []`
`227`	`229`	`# If the current value is None or an empty list and it is among the following keys`
`228`	`230`	`load_keys = [`
`229`	`231`	`'custom_register_path',`