Skip to content

Commit cb64cb7

Browse files
committed
Merge branch 'main' into release/3.5
2 parents f958295 + 4dfbf44 commit cb64cb7

File tree

13 files changed

+73
-39
lines changed

13 files changed

+73
-39
lines changed

docs/source/Instruction/命令行参数.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -359,6 +359,7 @@ Vera使用`target_modules`, `target_regex`, `modules_to_save`三个参数.
359359
- packing_cache: 指定 packing 缓存目录。默认值为`None`,表示缓存将存储在环境变量 `$MODELSCOPE_CACHE`所指定的路径下。在跨节点使用 packing 功能时,需确保所有节点的 packing 缓存路径共享且一致。你可以通过设置`MODELSCOPE_CACHE`环境变量,或在命令行中添加 `--packing_cache <shared_path>`参数来实现这一要求。
360360
- 🔥lazy_tokenize: 是否使用lazy_tokenize。若该参数设置为False,则在训练之前对所有的数据集样本进行tokenize(多模态模型则包括从磁盘中读取图片)。该参数在LLM训练中默认设置为False,而MLLM训练默认为True,节约内存。
361361
- use_logits_to_keep: 通过在`forward`中根据labels传入logits_to_keep,减少无效logits的计算与存储,从而减少显存占用并加快训练速度。默认为None,进行自动选择。
362+
- 注意:为了稳定性,多模态模型该值默认为False,需要手动设置。
362363
- acc_strategy: 训练和验证时计算acc的策略。可选为`seq``token`级别的acc,默认为`token`
363364
- max_new_tokens: 覆盖生成参数。predict_with_generate=True时的最大生成token数量,默认64。
364365
- temperature: 覆盖生成参数。predict_with_generate=True时的temperature,默认0。

docs/source/Instruction/支持的模型和数据集.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,8 @@
438438
|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32)|minicpm|minicpm|transformers>=4.36.0|&#x2718;|-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|
439439
|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16)|minicpm|minicpm|transformers>=4.36.0|&#x2718;|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)|
440440
|[OpenBMB/MiniCPM-2B-128k](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-128k)|minicpm_chatml|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)|
441+
|[OpenBMB/MiniCPM4-0.5B](https://modelscope.cn/models/OpenBMB/MiniCPM4-0.5B)|minicpm_chatml|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM4-0.5B](https://huggingface.co/openbmb/MiniCPM4-0.5B)|
442+
|[OpenBMB/MiniCPM4-8B](https://modelscope.cn/models/OpenBMB/MiniCPM4-8B)|minicpm_chatml|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B)|
441443
|[OpenBMB/MiniCPM3-4B](https://modelscope.cn/models/OpenBMB/MiniCPM3-4B)|minicpm3|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)|
442444
|[OpenBMB/MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B)|minicpm_moe|minicpm|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)|
443445
|[TeleAI/TeleChat-7B](https://modelscope.cn/models/TeleAI/TeleChat-7B)|telechat|telechat|-|&#x2718;|-|[Tele-AI/telechat-7B](https://huggingface.co/Tele-AI/telechat-7B)|

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -368,6 +368,7 @@ Training arguments include the [base arguments](#base-arguments), [Seq2SeqTraine
368368
- packing_cache: Specifies the directory for packing cache. The default value is `None`, which means the cache will be stored in the path defined by the environment variable `$MODELSCOPE_CACHE`. When using the packing feature across multiple nodes, ensure that all nodes share the same packing cache directory. You can achieve this by setting the `MODELSCOPE_CACHE` environment variable or by adding the `--packing_cache <shared_path>` argument in the command line.
369369
- 🔥lazy_tokenize: Whether to use lazy tokenization. If set to False, all dataset samples are tokenized before training (for multimodal models, this includes reading images from disk). This parameter defaults to False for LLM training, and True for MLLM training, to save memory.
370370
- use_logits_to_keep: Pass `logits_to_keep` in the `forward` method based on labels to reduce the computation and storage of unnecessary logits, thereby reducing memory usage and accelerating training. The default is `None`, which enables automatic selection.
371+
- Note: For stability, this value is set to False by default for multimodal models and needs to be manually enabled.
371372
- acc_strategy: Strategy for calculating accuracy during training and validation. Options are `seq`-level and `token`-level accuracy, with `token` as the default.
372373
- max_new_tokens: Generation parameter override. The maximum number of tokens to generate when `predict_with_generate=True`, defaulting to 64.
373374
- temperature: Generation parameter override. The temperature setting when `predict_with_generate=True`, defaulting to 0.

docs/source_en/Instruction/Supported-models-and-datasets.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,8 @@ The table below introduces the models integrated with ms-swift:
438438
|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32)|minicpm|minicpm|transformers>=4.36.0|&#x2718;|-|[openbmb/MiniCPM-2B-dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|
439439
|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16)|minicpm|minicpm|transformers>=4.36.0|&#x2718;|-|[openbmb/MiniCPM-1B-sft-bf16](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)|
440440
|[OpenBMB/MiniCPM-2B-128k](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-128k)|minicpm_chatml|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)|
441+
|[OpenBMB/MiniCPM4-0.5B](https://modelscope.cn/models/OpenBMB/MiniCPM4-0.5B)|minicpm_chatml|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM4-0.5B](https://huggingface.co/openbmb/MiniCPM4-0.5B)|
442+
|[OpenBMB/MiniCPM4-8B](https://modelscope.cn/models/OpenBMB/MiniCPM4-8B)|minicpm_chatml|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B)|
441443
|[OpenBMB/MiniCPM3-4B](https://modelscope.cn/models/OpenBMB/MiniCPM3-4B)|minicpm3|chatml|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)|
442444
|[OpenBMB/MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B)|minicpm_moe|minicpm|transformers>=4.36|&#x2718;|-|[openbmb/MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B)|
443445
|[TeleAI/TeleChat-7B](https://modelscope.cn/models/TeleAI/TeleChat-7B)|telechat|telechat|-|&#x2718;|-|[Tele-AI/telechat-7B](https://huggingface.co/Tele-AI/telechat-7B)|

swift/llm/dataset/dataset/llm.py

Lines changed: 21 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -325,13 +325,20 @@ def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
325325

326326
class StsbPreprocessor(ResponsePreprocessor):
327327

328+
def __init__(self, sim_threshold: Optional[float] = None):
329+
self.sim_threshold = sim_threshold
330+
super().__init__()
331+
328332
def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
329333
row = {
330334
'query': row['sentence1'],
331335
'response': row['sentence2'],
332336
'label': row['score'],
333337
}
334-
return super().preprocess(row)
338+
if self.sim_threshold is None or float(row['label']) >= self.sim_threshold:
339+
return super().preprocess(row)
340+
else:
341+
return None
335342

336343

337344
class StsbGeneratePreprocessor(ResponsePreprocessor):
@@ -364,6 +371,7 @@ def preprocess(self, row: Dict[str, Any]) -> Optional[Dict[str, Any]]:
364371
hf_dataset_id='sentence-transformers/stsb',
365372
subsets=[
366373
SubsetDataset('default', preprocess_func=StsbPreprocessor()), # embedding
374+
SubsetDataset('positive', preprocess_func=StsbPreprocessor(sim_threshold=0.75)), # infonce
367375
SubsetDataset('generate', preprocess_func=StsbGeneratePreprocessor()),
368376
SubsetDataset('reg', preprocess_func=StsbRegressionPreprocessor()),
369377
],
@@ -676,11 +684,22 @@ def repair_conversations(s: Union[str, Any]) -> Any:
676684
preprocess_func=MessagesPreprocessor(repair_messages=repair_conversations),
677685
tags=['chat', 'em']))
678686

687+
688+
class EmojiPreprocessr(ResponsePreprocessor):
689+
690+
def preprocess(self, row: Dict[str, Any]) -> Dict[str, Any]:
691+
# Remove dirty characters
692+
row['query'] = row['query'].replace('️', '')
693+
row['response'] = row['response'].replace('️', '')
694+
row['rejected_response'] = row['rejected_response'].replace('️', '')
695+
return super().preprocess(row)
696+
697+
679698
register_dataset(
680699
DatasetMeta(
681700
ms_dataset_id='hjh0119/shareAI-Llama3-DPO-zh-en-emoji',
682701
hf_dataset_id='shareAI/DPO-zh-en-emoji',
683-
preprocess_func=ResponsePreprocessor(columns={
702+
preprocess_func=EmojiPreprocessr(columns={
684703
'answer_zh': 'response',
685704
'answer_en': 'rejected_response'
686705
}),

swift/llm/model/model/minicpm.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -183,6 +183,10 @@ def get_model_tokenizer_minicpmv_2_x(model_dir: str,
183183
ModelGroup([
184184
Model('OpenBMB/MiniCPM-2B-128k', 'openbmb/MiniCPM-2B-128k'),
185185
]),
186+
ModelGroup([
187+
Model('OpenBMB/MiniCPM4-0.5B', 'openbmb/MiniCPM4-0.5B'),
188+
Model('OpenBMB/MiniCPM4-8B', 'openbmb/MiniCPM4-8B'),
189+
]),
186190
],
187191
TemplateType.chatml,
188192
get_model_tokenizer_with_flash_attn,

swift/llm/model/register.py

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,21 @@ def get_model_tokenizer_from_local(model_dir: str,
255255
InitModelStrategy.init_parameters(model, init_strategy)
256256

257257
model_info.config = model_config if model is None else model.config
258+
259+
pad_token = tokenizer.pad_token_id
260+
if pad_token is None:
261+
pad_token = tokenizer.eos_token_id
262+
if tokenizer.eos_token_id is None:
263+
tokenizer.eos_token_id = pad_token
264+
if tokenizer.pad_token_id is None:
265+
tokenizer.pad_token_id = pad_token
266+
assert tokenizer.eos_token_id is not None
267+
assert tokenizer.pad_token_id is not None
268+
269+
if model is not None:
270+
# fix seq classification task
271+
HfConfigFactory.set_model_config_attr(model, 'pad_token_id', pad_token)
272+
258273
return model, tokenizer
259274

260275

@@ -583,20 +598,7 @@ def get_model_tokenizer(
583598
tokenizer.model_info = model_info
584599
tokenizer.model_meta = model_meta
585600

586-
pad_token = tokenizer.pad_token_id
587-
if pad_token is None:
588-
pad_token = tokenizer.eos_token_id
589-
if tokenizer.eos_token_id is None:
590-
tokenizer.eos_token_id = pad_token
591-
if tokenizer.pad_token_id is None:
592-
tokenizer.pad_token_id = pad_token
593-
assert tokenizer.eos_token_id is not None
594-
assert tokenizer.pad_token_id is not None
595-
596601
if model is not None:
597-
# fix seq classification task
598-
HfConfigFactory.set_model_config_attr(model, 'pad_token_id', pad_token)
599-
600602
model.model_info = model_info
601603
model.model_meta = model_meta
602604
model.model_dir = model_dir

swift/llm/template/template/emu3.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,10 @@ class Emu3GenTemplate(Template):
2727
'lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, '
2828
'worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry.')
2929

30-
def __init__(self, *args, **kwargs):
31-
super().__init__(*args, **kwargs)
30+
def init_processor(self, processor) -> None:
31+
if processor is None:
32+
return
33+
super().init_processor(processor)
3234
self.bov = self.processor.tokenizer.encode(self.processor.visual_template[0].format(token_id=0))[0]
3335
self.eov = self.processor.tokenizer.encode(self.processor.visual_template[0].format(token_id=self.COOKBOOK_SIZE
3436
- 1))[0]

swift/llm/template/template/qwen.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -408,8 +408,10 @@ class Qwen2_5OmniTemplate(Qwen2_5VLTemplate):
408408
version = 'omni'
409409
placeholder_tokens = ['<|IMAGE|>', '<|AUDIO|>', '<|VIDEO|>']
410410

411-
def __init__(self, *args, **kwargs):
412-
super().__init__(*args, **kwargs)
411+
def init_processor(self, processor) -> None:
412+
if processor is None:
413+
return
414+
super().init_processor(processor)
413415
from transformers.models.qwen2_5_omni.processing_qwen2_5_omni import Qwen2_5OmniProcessorKwargs
414416
default = Qwen2_5OmniProcessorKwargs._defaults
415417
self.seconds_per_chunk = default['videos_kwargs']['seconds_per_chunk']

swift/megatron/train/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -205,7 +205,7 @@ def get_batch(data_iterator):
205205

206206
# TODO: this is pretty hacky, find a better way
207207
if (not mpu.is_pipeline_first_stage()) and (not mpu.is_pipeline_last_stage()):
208-
return None, None, None, None, None
208+
return {key: None for key in ['input_ids', 'attention_mask', 'position_ids']}
209209

210210
# get batches based on the TP rank you are on
211211
batch = get_batch_on_this_tp_rank(data_iterator)

0 commit comments

Comments
 (0)