Skip to content

Commit 1e91895

Browse files
committed
Merge branch 'main' into release/3.3
2 parents 78cc3ed + 634c15d commit 1e91895

File tree

7 files changed

+7
-15
lines changed

7 files changed

+7
-15
lines changed

docs/source/Instruction/Megatron-SWIFT训练.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -38,10 +38,8 @@ swift export \
3838
--model Qwen/Qwen2.5-7B-Instruct \
3939
--to_mcore true \
4040
--torch_dtype bfloat16 \
41-
--test_convert_precision true \
4241
--output_dir Qwen2.5-7B-Instruct-mcore
4342
```
44-
- 注意:若出现OOM,请将`--test_convert_precision true`参数去除
4543

4644
然后,使用以下脚本进行训练,训练所需显存资源为2*80GiB:
4745
```shell
@@ -82,7 +80,6 @@ swift export \
8280
--mcore_model megatron_output/Qwen2.5-7B-Instruct/vx-xxx \
8381
--to_hf true \
8482
--torch_dtype bfloat16 \
85-
--test_convert_precision true \
8683
--output_dir megatron_output/Qwen2.5-7B-Instruct/vx-xxx-hf
8784
```
8885

docs/source/Instruction/命令行参数.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -518,7 +518,7 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
518518
- to_hf: Megatron格式权重转成HF格式。默认为False
519519
- mcore_model: mcore格式模型路径。默认为None
520520
- thread_count: `--to_mcore true`时的模型切片数。默认为None,根据模型大小自动设置,使得最大分片小于10GB
521-
- test_convert_precision: 测试HF和Megatron格式权重转换的精度误差。默认为False
521+
- 🔥test_convert_precision: 测试HF和Megatron格式权重转换的精度误差。默认为False
522522
- 🔥push_to_hub: 是否推送hub,默认为False。例子参考[这里](https://github.com/modelscope/ms-swift/blob/main/examples/export/push_to_hub.sh)
523523
- hub_model_id: 推送的model_id,默认为None
524524
- hub_private_repo: 是否是private repo,默认为False

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -535,7 +535,7 @@ Export Arguments include the [basic arguments](#base-arguments) and [merge argum
535535
- to_hf: Convert weights from Megatron format to HF format. Default is False.
536536
- mcore_model: Path to the mcore format model. Default is None.
537537
- thread_count: The number of model slices when `--to_mcore true` is set. Defaults to None, and is automatically configured based on the model size, ensuring that the largest slice is less than 10GB.
538-
- test_convert_precision: Test the precision error when converting weights between HF and Megatron formats. Default is False.
538+
- 🔥test_convert_precision: Test the precision error when converting weights between HF and Megatron formats. Default is False.
539539
- 🔥push_to_hub: Whether to push to the hub, with the default being False. Examples can be found [here](https://github.com/modelscope/ms-swift/blob/main/examples/export/push_to_hub.sh).
540540
- hub_model_id: Model ID for pushing, default is None.
541541
- hub_private_repo: Whether it is a private repo, default is False.

docs/source_en/Instruction/Megatron-SWIFT-Training.md

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,10 +40,8 @@ swift export \
4040
--model Qwen/Qwen2.5-7B-Instruct \
4141
--to_mcore true \
4242
--torch_dtype bfloat16 \
43-
--test_convert_precision true \
4443
--output_dir Qwen2.5-7B-Instruct-mcore
4544
```
46-
- Note: If an OOM (Out Of Memory) error occurs, please remove the --test_convert_precision true parameter.
4745

4846
Next, use the following script to start training. The required GPU memory resources are 2*80GiB:
4947

@@ -86,7 +84,6 @@ swift export \
8684
--mcore_model megatron_output/Qwen2.5-7B-Instruct/vx-xxx \
8785
--to_hf true \
8886
--torch_dtype bfloat16 \
89-
--test_convert_precision true \
9087
--output_dir megatron_output/Qwen2.5-7B-Instruct/vx-xxx-hf
9188
```
9289

swift/llm/model/model/mllm.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -84,7 +84,7 @@ def to_dict(self, *args, **kwargs):
8484
if model is not None:
8585
model.config._to_dict = model.config.to_dict
8686
model.config.to_dict = MethodType(to_dict, model.config)
87-
87+
patch_output_clone(model.model.transformer.wte)
8888
return model, processor
8989

9090

@@ -114,8 +114,8 @@ def get_model_tokenizer_molmo(model_dir: str,
114114
model_cls = get_class_from_dynamic_module('modeling_molmo.MolmoForCausalLM', model_dir)
115115
model_cls._no_split_modules = ['MolmoSequentialBlock']
116116
model, processor = get_model_tokenizer_multimodal(model_dir, model_info, model_kwargs, load_model, **kwargs)
117-
118-
patch_output_clone(model.model.transformer.wte)
117+
if model is not None:
118+
patch_output_clone(model.model.transformer.wte)
119119
return model, processor
120120

121121

swift/llm/model/patcher.py

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -249,10 +249,6 @@ def patch_automodel_for_sequence_classification(model_meta):
249249

250250
@classmethod
251251
def _new_from_pretrained(cls, *args, **kwargs):
252-
cls_name = cls.__name__
253-
cls_name = cls_name.split('For', 1)[0]
254-
cls_name += 'ForSequenceClassification'
255-
cls = type(cls_name, (cls, ), {}) # new_cls
256252
__init__ = cls.__init__
257253

258254
def __new_init__(self, *args, **kwargs):

swift/llm/template/base.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,6 +123,8 @@ def __init__(
123123
logger.info(f'agent_template: {agent_template}')
124124
self.agent_template = agent_templates[agent_template]()
125125
self.norm_bbox = norm_bbox or self.norm_bbox
126+
logger.info(f'max_length: {self.max_length}')
127+
logger.info(f'norm_bbox: {self.norm_bbox}')
126128
if self.is_encoder_decoder:
127129
self.skip_prompt = False
128130
self.mode: Literal['pt', 'vllm', 'lmdeploy', # infer

0 commit comments

Comments
 (0)