Skip to content

Commit e9f9e08

Browse files
authored
[megatron] Fix the display issue for train_type=lora (#4845)
1 parent f891985 commit e9f9e08

File tree

4 files changed

+12
-4
lines changed

4 files changed

+12
-4
lines changed

docs/source/Instruction/Megatron-SWIFT训练.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
4040

4141
首先,我们需要将HF格式的权重转为Megatron格式:
4242
- 若出现OOM,将`CUDA_VISIBLE_DEVICES=0`删除即可。
43+
- "ms-swift>=3.6"推荐增加`--test_convert_precision true`参数测试转换精度。
4344
```shell
4445
CUDA_VISIBLE_DEVICES=0 \
4546
swift export \
@@ -87,7 +88,8 @@ megatron sft \
8788

8889
最后,将Megatron格式权重转为HF格式:
8990
- 注意:`--mcore_model`请指向`iter_xxx`的上级目录。默认会使用`latest_checkpointed_iteration.txt`中对应的checkpoint。
90-
91+
- 若出现OOM,将`CUDA_VISIBLE_DEVICES=0`删除即可。
92+
- "ms-swift>=3.6"推荐增加`--test_convert_precision true`参数测试转换精度。
9193
```shell
9294
CUDA_VISIBLE_DEVICES=0 \
9395
swift export \

docs/source_en/Instruction/Megatron-SWIFT-Training.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@ The training module in the dependent library Megatron-LM will be cloned and inst
4040
This section introduces a quick start example for fine-tuning the self-awareness of the Qwen2.5-7B-Instruct model using two 80GiB A100 GPUs. The following best practices can be completed within 10 minutes.
4141

4242
First, we need to convert the weights from HF (Hugging Face) format to Megatron format:
43-
- If OOM occurs, simply remove `CUDA_VISIBLE_DEVICES=0`.
43+
- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
44+
- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
4445
```shell
4546
CUDA_VISIBLE_DEVICES=0 \
4647
swift export \
@@ -88,6 +89,8 @@ megatron sft \
8889

8990
Finally, convert the Megatron format weights back to HF format:
9091
- Note: Please point `--mcore_model` to the parent directory of `iter_xxx`. By default, the corresponding checkpoint from `latest_checkpointed_iteration.txt` will be used.
92+
- If you encounter OOM, simply remove `CUDA_VISIBLE_DEVICES=0`.
93+
- For "ms-swift>=3.6", it is recommended to add the `--test_convert_precision true` parameter to test conversion precision.
9194

9295
```shell
9396
CUDA_VISIBLE_DEVICES=0 \

swift/llm/argument/base_args/base_args.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -224,6 +224,8 @@ def load_args_from_ckpt(self) -> None:
224224
'bnb_4bit_quant_type',
225225
'bnb_4bit_use_double_quant',
226226
]
227+
if 'megatron' in self.__class__.__name__.lower():
228+
force_load_keys = []
227229
# If the current value is None or an empty list and it is among the following keys
228230
load_keys = [
229231
'custom_register_path',

tests/megatron/test_train.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ def test_sft():
1818
model_author='swift',
1919
model_name='swift-robot',
2020
eval_iters=5,
21+
sequence_parallel=True,
2122
finetune=True))
2223

2324

@@ -35,5 +36,5 @@ def test_pt():
3536

3637

3738
if __name__ == '__main__':
38-
# test_sft()
39-
test_pt()
39+
test_sft()
40+
# test_pt()

0 commit comments

Comments
 (0)