Skip to content

Commit f57790c

Browse files
committed
Merge branch 'main' into release/3.3
2 parents ea81928 + d36edc9 commit f57790c

File tree

24 files changed

+72
-19
lines changed

24 files changed

+72
-19
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ Running Environment:
125125
| peft | >=0.11,<0.16 | ||
126126
| trl | >=0.13,<0.17 | 0.16 |RLHF|
127127
| deepspeed | >=0.14 | 0.14.5 | Training |
128-
| vllm | >=0.5.1 | 0.8.3 | Inference/Deployment/Evaluation |
128+
| vllm | >=0.5.1 | 0.7.3/0.8.3 | Inference/Deployment/Evaluation |
129129
| lmdeploy | >=0.5 | 0.7.2.post1 | Inference/Deployment/Evaluation |
130130
| evalscope | >=0.11 | | Evaluation |
131131

README_CN.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -120,7 +120,7 @@ pip install -e .
120120
| peft | >=0.11,<0.16 | ||
121121
| trl | >=0.13,<0.17 | 0.16 |RLHF|
122122
| deepspeed | >=0.14 | 0.14.5 |训练|
123-
| vllm | >=0.5.1 | 0.8.3 |推理/部署/评测|
123+
| vllm | >=0.5.1 | 0.7.3/0.8.3 |推理/部署/评测|
124124
| lmdeploy | >=0.5 | 0.7.2.post1 |推理/部署/评测|
125125
| evalscope | >=0.11 | |评测|
126126

docs/source/GetStarted/SWIFT安装.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -69,7 +69,7 @@ modelscope-registry.us-west-1.cr.aliyuncs.com/modelscope-repo/modelscope:ubuntu2
6969
| peft | >=0.11,<0.16 | ||
7070
| trl | >=0.13,<0.17 | 0.16 |RLHF|
7171
| deepspeed | >=0.14 | 0.14.5 |训练|
72-
| vllm | >=0.5.1 | 0.8.3 |推理/部署/评测|
72+
| vllm | >=0.5.1 | 0.7.3/0.8.3 |推理/部署/评测|
7373
| lmdeploy | >=0.5 | 0.7.2.post1 |推理/部署/评测|
7474
| evalscope | >=0.11 | |评测|
7575

docs/source/Instruction/GRPO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,7 @@ A conversation between User and Assistant. The user asks a question, and the Ass
133133
- move_model_batches: 在模型向vLLM/LMDeploy等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个
134134
- offload_optimizer: 是否在vLLM/LMDeploy推理时offload optimizer参数,默认为False
135135
- offload_model: 是否在vLLM/LMDeploy推理时offload 模型本身,默认为False
136+
- 注意:若该参数设置为True,训练时grad_norm一直为0,请安装`vllm==0.7.3`
136137
- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False
137138
- multi_turn_func: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现
138139
- mini_batch_size:用于将每个设备上的批次大小(per_device_batch)进一步切分为更小的子批次。为确保切分有效,per_device_batch 需要能够被 mini_batch_size 整除

docs/source/Instruction/命令行参数.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -413,6 +413,7 @@ reward模型参数将在PPO、GRPO中使用。
413413
- move_model_batches: 在模型向vLLM/LMDeploy等快速推理框架移动参数时,将layers分为多少个batch. 默认为None, 代表整个模型不进行拆分,否则拆分为move_model_batches+1(非layer参数)+1(多模态部分参数)个
414414
- offload_optimizer: 是否在vLLM/LMDeploy推理时offload optimizer参数,默认为False
415415
- offload_model: 是否在vLLM/LMDeploy推理时offload 模型本身,默认为False
416+
- 注意:若该参数设置为True,训练时grad_norm一直为0,请安装`vllm==0.7.3`
416417
- gc_collect_after_offload: 是否在offload结束时进行gc(python gc和GPU gc),默认为False
417418
- multi_turn_func: 多轮GRPO参数, 传入对应的plugin名称, 同时在plugin/multi_turn.py中添加好对应的实现
418419
- mini_batch_size:用于将每个设备上的批次大小(per_device_batch)进一步切分为更小的子批次。为确保切分有效,per_device_train_batch_size 需要能够被 mini_batch_size 整除
@@ -578,6 +579,7 @@ App参数继承于[部署参数](#部署参数), [Web-UI参数](#Web-UI参数)
578579
### qwen2_5_omni
579580
qwen2_5_omni除了包含qwen2_5_vl和qwen2_audio的模型特定参数外,还包含以下参数:
580581
- USE_AUDIO_IN_VIDEO: 默认为False
582+
- 🔥ENABLE_AUDIO_OUTPUT: 默认为True。若使用zero3进行训练,请设置为False
581583

582584
### internvl, internvl_phi3
583585
参数含义可以查看[这里](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)

docs/source/Instruction/支持的模型和数据集.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,7 @@
356356
|[deepseek-ai/DeepSeek-V3](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3)|
357357
|[deepseek-ai/DeepSeek-V3-0324](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3-0324)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)|
358358
|[cognitivecomputations/DeepSeek-V3-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-awq)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[cognitivecomputations/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ)|
359+
|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-0324-AWQ)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ)|
359360
|[deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1)|deepseek_r1|deepseek_r1|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)|
360361
|[deepseek-ai/DeepSeek-R1-Zero](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Zero)|deepseek_r1|deepseek_r1|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-R1-Zero](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero)|
361362
|[cognitivecomputations/DeepSeek-R1-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-R1-awq)|deepseek_r1|deepseek_r1|transformers>=4.39.3|&#x2718;|-|[cognitivecomputations/DeepSeek-R1-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-R1-AWQ)|

docs/source_en/GetStarted/SWIFT-installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@ More images can be found [here](https://modelscope.cn/docs/intro/environment-set
7070
| peft | >=0.11,<0.16 | | |
7171
| trl | >=0.13,<0.17 | 0.16 | RLHF |
7272
| deepspeed | >=0.14 | 0.14.5 | Training |
73-
| vllm | >=0.5.1 | 0.8.3 | Inference/Deployment/Evaluation |
73+
| vllm | >=0.5.1 | 0.7.3/0.8.3 | Inference/Deployment/Evaluation |
7474
| lmdeploy | >=0.5 | 0.7.2.post1 | Inference/Deployment/Evaluation |
7575
| evalscope | >=0.11 | | Evaluation |
7676

docs/source_en/Instruction/Command-line-parameters.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -424,6 +424,7 @@ The meanings of the following parameters can be referenced [here](https://huggin
424424
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches.
425425
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM/LMDeploy. The default is `False`.
426426
- offload_model: Whether to offload the model itself during inference with vLLM/LMDeploy. The default is `False`.
427+
- Note: If this parameter is set to True and the grad_norm remains zero during training, please install vllm==0.7.3.
427428
- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`.
428429
- multi_turn_func: The multi turn GRPO plugin name. Add your multi-turn implementation in plugin/multi_turn.py
429430
- mini_batch_size: Used to further split the batch size on each device (per_device_batch) into smaller sub-batches. To ensure the split is valid, per_device_train_batch_size needs be divisible by mini_batch_size
@@ -590,6 +591,7 @@ The parameter meanings are the same as in the `qwen_vl_utils` or `qwen_omni_util
590591
### qwen2_5_omni
591592
qwen2_5_omni not only includes the model-specific parameters of qwen2_5_vl and qwen2_audio, but also contains the following parameter:
592593
- USE_AUDIO_IN_VIDEO: Default is False.
594+
- 🔥ENABLE_AUDIO_OUTPUT: Default is True. If training with zero3, set it to False.
593595

594596
### internvl, internvl_phi3
595597
For the meaning of the arguments, please refer to [here](https://modelscope.cn/models/OpenGVLab/Mini-InternVL-Chat-2B-V1-5)

docs/source_en/Instruction/GRPO.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,7 @@ Arguments
136136
- move_model_batches: When moving model parameters to fast inference frameworks such as vLLM/LMDeploy, determines how many batches to divide the layers into. The default is `None`, which means the entire model is not split. Otherwise, the model is split into `move_model_batches + 1` (non-layer parameters) + `1` (multi-modal component parameters) batches.
137137
- offload_optimizer: Whether to offload optimizer parameters during inference with vLLM/LMDeploy. The default is `False`.
138138
- offload_model: Whether to offload the model itself during inference with vLLM/LMDeploy. The default is `False`.
139+
- Note: If this parameter is set to True and the grad_norm remains zero during training, please install vllm==0.7.3.
139140
- gc_collect_after_offload: Whether to perform garbage collection (both Python GC and GPU GC) after offloading. The default is `False`.
140141
- multi_turn_func: The multi turn GRPO plugin name. Add your multi-turn implementation in plugin/multi_turn.py
141142
- mini_batch_size: Used to further split the batch size on each device (per_device_batch) into smaller sub-batches. To ensure the split is valid, per_device_train_batch_size needs be divisible by mini_batch_size

docs/source_en/Instruction/Supported-models-and-datasets.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -356,6 +356,7 @@ The table below introduces the models integrated with ms-swift:
356356
|[deepseek-ai/DeepSeek-V3](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3)|
357357
|[deepseek-ai/DeepSeek-V3-0324](https://modelscope.cn/models/deepseek-ai/DeepSeek-V3-0324)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-V3-0324](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324)|
358358
|[cognitivecomputations/DeepSeek-V3-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-awq)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[cognitivecomputations/DeepSeek-V3-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-AWQ)|
359+
|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://modelscope.cn/models/cognitivecomputations/DeepSeek-V3-0324-AWQ)|deepseek_v2_5|deepseek_v2_5|transformers>=4.39.3|&#x2718;|-|[cognitivecomputations/DeepSeek-V3-0324-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-V3-0324-AWQ)|
359360
|[deepseek-ai/DeepSeek-R1](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1)|deepseek_r1|deepseek_r1|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-R1](https://huggingface.co/deepseek-ai/DeepSeek-R1)|
360361
|[deepseek-ai/DeepSeek-R1-Zero](https://modelscope.cn/models/deepseek-ai/DeepSeek-R1-Zero)|deepseek_r1|deepseek_r1|transformers>=4.39.3|&#x2718;|-|[deepseek-ai/DeepSeek-R1-Zero](https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero)|
361362
|[cognitivecomputations/DeepSeek-R1-awq](https://modelscope.cn/models/cognitivecomputations/DeepSeek-R1-awq)|deepseek_r1|deepseek_r1|transformers>=4.39.3|&#x2718;|-|[cognitivecomputations/DeepSeek-R1-AWQ](https://huggingface.co/cognitivecomputations/DeepSeek-R1-AWQ)|

0 commit comments

Comments
 (0)