Skip to content

Commit 3b056b0

Browse files
authored
fix omni zero3 (#3826)
1 parent be809df commit 3b056b0

File tree

10 files changed

+15
-2
lines changed

10 files changed

+15
-2
lines changed

docs/source/Instruction/命令行参数.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -578,6 +578,7 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
578578
### qwen2_5_omni
579579
qwen2_5_omni除了包含qwen2_5_vl和qwen2_audio的模型特定参数外,还包含以下参数:
580580
- USE_AUDIO_IN_VIDEO: 默认为False
581+
- 🔥ENABLE_AUDIO_OUTPUT: 默认为True。若使用zero3进行训练,请设置为False
581582

582583
### internvl, internvl_phi3
583584
参数含义可以查看[这里](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)

docs/source/Instruction/支持的模型和数据集.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,7 @@
356356
|[deepseek-ai/DeepSeek-V3](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3)|
357357
|[deepseek-ai/DeepSeek-V3-0324](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3-0324)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)|
358358
|[cognitivecomputations/DeepSeek-V3-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-awq)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[cognitivecomputations/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ)|
359+
|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-0324-AWQ)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ)|
359360
|[deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1)|deepseek_r1|deepseek_r1|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)|
360361
|[deepseek-ai/DeepSeek-R1-Zero](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Zero)|deepseek_r1|deepseek_r1|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-R1-Zero](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero)|
361362
|[cognitivecomputations/DeepSeek-R1-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-R1-awq)|deepseek_r1|deepseek_r1|transformers>=4.39.3|✘|-|[cognitivecomputations/DeepSeek-R1-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-R1-AWQ)|

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -590,6 +590,7 @@ The parameter meanings are the same as in the `qwen_vl_utils` or `qwen_omni_util
590590
### qwen2_5_omni
591591
qwen2_5_omni not only includes the model-specific parameters of qwen2_5_vl and qwen2_audio, but also contains the following parameter:
592592
- USE_AUDIO_IN_VIDEO: Default is False.
593+
- 🔥ENABLE_AUDIO_OUTPUT: Default is True. If training with zero3, set it to False.
593594

594595
### internvl, internvl_phi3
595596
For the meaning of the arguments, please refer to [here](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)

docs/source_en/Instruction/Supported-models-and-datasets.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,7 @@ The table below introduces the models integrated with ms-swift:
356356
|[deepseek-ai/DeepSeek-V3](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3)|
357357
|[deepseek-ai/DeepSeek-V3-0324](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3-0324)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)|
358358
|[cognitivecomputations/DeepSeek-V3-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-awq)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[cognitivecomputations/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ)|
359+
|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-0324-AWQ)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|✘|-|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ)|
359360
|[deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1)|deepseek_r1|deepseek_r1|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)|
360361
|[deepseek-ai/DeepSeek-R1-Zero](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Zero)|deepseek_r1|deepseek_r1|transformers>=4.39.3|✘|-|[deepseek-ai/DeepSeek-R1-Zero](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero)|
361362
|[cognitivecomputations/DeepSeek-R1-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-R1-awq)|deepseek_r1|deepseek_r1|transformers>=4.39.3|✘|-|[cognitivecomputations/DeepSeek-R1-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-R1-AWQ)|

examples/train/grpo/qwen2_5_omni/grpo.sh

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,6 @@
11
# 4 * 50GiB
2+
pip uninstall transformers
3+
pip install git+https://github.com/huggingface/transformers@f742a644ca32e65758c3adb36225aef1731bd2a8
24
pip install math_verify trl -U
35

46
MAX_PIXELS=1003520 \

examples/train/multimodal/omni/infer.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@ CUDA_VISIBLE_DEVICES=0 \
22
VIDEO_MAX_PIXELS=50176 \
33
FPS_MAX_FRAMES=12 \
44
MAX_PIXELS=1003520 \
5+
ENABLE_AUDIO_OUTPUT=0 \
56
swift infer \
67
--adapters output/vx-xxx/checkpoint-xxx \
78
--stream true \

examples/train/multimodal/omni/sft.sh

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,16 @@
1-
# 4*25GB
1+
# 4*35GB
22
# A demo for four modalities that can be run directly
3+
pip uninstall transformers
4+
pip install git+https://github.com/huggingface/transformers@f742a644ca32e65758c3adb36225aef1731bd2a8
5+
36
nproc_per_node=4
47

58
CUDA_VISIBLE_DEVICES=0,1,2,3 \
69
NPROC_PER_NODE=$nproc_per_node \
710
VIDEO_MAX_PIXELS=50176 \
811
FPS_MAX_FRAMES=12 \
912
MAX_PIXELS=1003520 \
13+
ENABLE_AUDIO_OUTPUT=0 \
1014
swift sft \
1115
--model Qwen/Qwen2.5-Omni-7B \
1216
--dataset 'AI-ModelScope/alpaca-gpt4-data-zh#2000' \

swift/llm/argument/base_args/model_args.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ class ModelArguments:
4040
torch_dtype: Literal['bfloat16', 'float16', 'float32', None] = None
4141
# flash_attn: It will automatically convert names based on the model.
4242
# None: It will be automatically selected between sdpa and eager.
43-
attn_impl: Optional[str] = None # 'flash_attn', 'sdpa', 'eager'
43+
attn_impl: Literal['flash_attn', 'sdpa', 'eager', None] = None
4444

4545
num_labels: Optional[int] = None
4646
problem_type: Literal['regression', 'single_label_classification', 'multi_label_classification'] = None

swift/llm/model/model/deepseek.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -110,6 +110,7 @@ def get_model_tokenizer_deepseek_moe(model_dir: str,
110110
]),
111111
ModelGroup([
112112
Model('cognitivecomputations/DeepSeek-V3-awq', 'cognitivecomputations/DeepSeek-V3-AWQ'),
113+
Model('cognitivecomputations/DeepSeek-V3-0324-AWQ', 'cognitivecomputations/DeepSeek-V3-0324-AWQ')
113114
])
114115
],
115116
TemplateType.deepseek_v2_5,

swift/llm/model/model/qwen.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -621,6 +621,7 @@ def get_model_tokenizer_qwen2_5_omni(model_dir, *args, **kwargs):
621621
kwargs['tokenizer'] = processor.tokenizer
622622
kwargs['model_config'] = Qwen2_5OmniConfig.from_pretrained(model_dir, trust_remote_code=True)
623623
patch_qwen_vl_utils(vision_process)
624+
kwargs['model_config'].enable_audio_output = get_env_args('ENABLE_AUDIO_OUTPUT', bool, True)
624625
model, _ = get_model_tokenizer_with_flash_attn(model_dir, *args, **kwargs)
625626
if model:
626627
use_submodel_func(model, 'thinker')

0 commit comments

Comments
 (0)