Skip to content

Commit 5f6a522

Browse files
committed
Merge branch 'main' into release/2.0
2 parents a5346cd + 6878b1c commit 5f6a522

26 files changed

+625
-97
lines changed

README.md

Lines changed: 10 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
4747
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
4848

4949
## 🎉 News
50+
- 2024.05.24: Supports Phi3-vision model, Use model_type `phi3-vision-128k-instruct` to train.
5051
- 2024.05.22: Supports DeepSeek-V2-Lite series models, model_type are `deepseek-v2-lite` and `deepseek-v2-lite-chat`
5152
- 2024.05.22: Supports TeleChat-12B-v2 model with quantized version, model_type are `telechat-12b-v2` and `telechat-12b-v2-gptq-int4`
5253
- 🔥2024.05.21: Inference and fine-tuning support for MiniCPM-Llama3-V-2_5 are now available. For more details, please refer to [minicpm-v-2.5 Best Practice](docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md).
@@ -521,18 +522,19 @@ The complete list of supported models and datasets can be found at [Supported Mo
521522

522523
| Model Type | Model Introduction | Language | Model Size | Model Type |
523524
|------------------|------------------------------------------------------------------------|--------------------|-------------------|------------------- |
524-
| Qwen-VL | [Tongyi Qwen vision model](https://github.com/QwenLM) | Chinese<br>English | 7B<br>including quantized versions | base model<br>chat model |
525-
| Qwen-Audio | [Tongyi Qwen speech model](https://github.com/QwenLM) | Chinese<br>English | 7B | base model<br>chat model |
526-
| YI-VL | [01AI's YI series vision models](https://github.com/01-ai) | Chinese<br>English | 6B-34B | chat model |
525+
| Qwen-VL | [Tongyi Qwen vision model](https://github.com/QwenLM) | Chinese<br>English | 7B<br>including quantized versions | base model<br>chat model |
526+
| Qwen-Audio | [Tongyi Qwen speech model](https://github.com/QwenLM) | Chinese<br>English | 7B | base model<br>chat model |
527+
| YI-VL | [01AI's YI series vision models](https://github.com/01-ai) | Chinese<br>English | 6B-34B | chat model |
527528
| XComposer2 | [Pujiang AI Lab InternLM vision model](https://github.com/InternLM/InternLM) | Chinese<br>English | 7B | chat model |
528-
| DeepSeek-VL | [DeepSeek series vision models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-7B | chat model |
529+
| DeepSeek-VL | [DeepSeek series vision models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-7B | chat model |
529530
| MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2_5 | [OpenBmB MiniCPM vision model](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 3B-9B | chat model |
530-
| CogVLM<br>CogVLM2<br>CogAgent | [Zhipu ChatGLM visual QA and Agent model](https://github.com/THUDM/) | Chinese<br>English | 17B-19B | chat model |
531-
| Llava | [Llava series models](https://github.com/haotian-liu/LLaVA) | English | 7B-34B | chat model |
531+
| CogVLM<br>CogVLM2<br>CogAgent | [Zhipu ChatGLM visual QA and Agent model](https://github.com/THUDM/) | Chinese<br>English | 17B-19B | chat model |
532+
| Llava | [Llava series models](https://github.com/haotian-liu/LLaVA) | English | 7B-34B | chat model |
532533
| Llava-Next | [Llava-Next series models](https://github.com/LLaVA-VL/LLaVA-NeXT) | Chinese<br>English | 8B-110B | chat model |
533-
| mPLUG-Owl | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl) | English | 11B | chat model |
534+
| mPLUG-Owl | [mPLUG-Owl series models](https://github.com/X-PLUG/mPLUG-Owl) | English | 11B | chat model |
534535
| InternVL | [InternVL](https://github.com/OpenGVLab/InternVL) | Chinese<br>English | 25.5B<br>including quantized version | chat model |
535-
| Llava-llama3 | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) | English | 8B | chat model |
536+
| Llava-llama3 | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) | English | 8B | chat model |
537+
| Phi3 | Microsoft | English | 4B | chat model |
536538

537539
#### Diffusion Models
538540

README_CN.md

Lines changed: 16 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
4848
可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift)[ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。
4949

5050
## 🎉 新闻
51+
- 2024.05.24: 支持Phi3多模态模型, 使用model_type `phi3-vision-128k-instruct`来训练.
5152
- 2024.05.22: 支持DeepSeek-V2-lite系列模型, model_type为 `deepseek-v2-lite``deekseek-v2-lite-chat`
5253
- 2024.05.22: 支持TeleChat-12b-v2模型和量化版本, model_type为 `telechat-12b-v2``telechat-12b-v2-gptq-int4`
5354
- 🔥2024.05.21: 支持 MiniCPM-Llama3-V-2_5 的推理与微调, 可以查看[minicpm-v-2.5最佳实践](docs/source/Multi-Modal/minicpm-v-2.5最佳实践.md).
@@ -518,20 +519,21 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
518519

519520
#### 多模态大模型
520521

521-
| 模型类型 | 模型介绍 | 语言 | 模型大小 | 模型类型 |
522-
| --------------- | ------------------------------------------------------------ | --------- | ---------------- | ----------------- |
523-
| Qwen-VL | [通义千问视觉模型](https://github.com/QwenLM) | 中文<br>英文 | 7B<br>包含量化版本 | base模型<br>chat模型 |
524-
| Qwen-Audio | [通义千问语音模型](https://github.com/QwenLM) | 中文<br>英文 | 7B | base模型<br>chat模型 |
525-
| YI-VL | [01AI的YI系列视觉模型](https://github.com/01-ai) | 中文<br>英文 | 6B-34B | chat模型 |
526-
| XComposer2 | [浦江实验室书生浦语视觉模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 7B | chat模型 |
527-
| DeepSeek-VL | [幻方系列视觉模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-7B | chat模型 |
528-
| MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2_5 | [OpenBmB MiniCPM视觉模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 3B-9B | chat模型 |
529-
| CogVLM<br>CogVLM2<br>CogAgent | [智谱ChatGLM视觉问答和Agent模型](https://github.com/THUDM/) | 中文<br>英文 | 17B-19B | chat模型 |
530-
| Llava | [Llava系列模型](https://github.com/haotian-liu/LLaVA) | 英文 | 7B-34B | chat模型 |
531-
| Llava-Next | [Llava-Next系列模型](https://github.com/LLaVA-VL/LLaVA-NeXT) | 中文<br>英文 | 8B-110B | chat模型 |
532-
| mPLUG-Owl | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl) | 英文 | 11B | chat模型 |
533-
| InternVL | [InternVL](https://github.com/OpenGVLab/InternVL) | 中文<br>英文 | 25.5B<br>包含量化版本 | chat模型 |
534-
| Llava-llama3 | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) | 英文 | 8B | chat model |
522+
| 模型类型 | 模型介绍 | 语言 | 模型大小 | 模型类型 |
523+
|-------------------------------------------|----------------------------------------------------------------------------| --------- |-----------------| ----------------- |
524+
| Qwen-VL | [通义千问视觉模型](https://github.com/QwenLM) | 中文<br>英文 | 7B<br>包含量化版本 | base模型<br>chat模型 |
525+
| Qwen-Audio | [通义千问语音模型](https://github.com/QwenLM) | 中文<br>英文 | 7B | base模型<br>chat模型 |
526+
| YI-VL | [01AI的YI系列视觉模型](https://github.com/01-ai) | 中文<br>英文 | 6B-34B | chat模型 |
527+
| XComposer2 | [浦江实验室书生浦语视觉模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 7B | chat模型 |
528+
| DeepSeek-VL | [幻方系列视觉模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-7B | chat模型 |
529+
| MiniCPM-V<br>MiniCPM-V-2<br>MiniCPM-V-2_5 | [OpenBmB MiniCPM视觉模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 3B-9B | chat模型 |
530+
| CogVLM<br>CogVLM2<br>CogAgent | [智谱ChatGLM视觉问答和Agent模型](https://github.com/THUDM/) | 中文<br>英文 | 17B-19B | chat模型 |
531+
| Llava | [Llava系列模型](https://github.com/haotian-liu/LLaVA) | 英文 | 7B-34B | chat模型 |
532+
| Llava-Next | [Llava-Next系列模型](https://github.com/LLaVA-VL/LLaVA-NeXT) | 中文<br>英文 | 8B-110B | chat模型 |
533+
| mPLUG-Owl | [mPLUG-Owl系列模型](https://github.com/X-PLUG/mPLUG-Owl) | 英文 | 11B | chat模型 |
534+
| InternVL | [InternVL](https://github.com/OpenGVLab/InternVL) | 中文<br>英文 | 25.5B<br>包含量化版本 | chat模型 |
535+
| Llava-llama3 | [xtuner](https://huggingface.co/xtuner/llava-llama-3-8b-v1_1-transformers) | 英文 | 8B | chat model |
536+
| Phi3 | Microsoft | 英文 | 4B | chat model |
535537

536538
#### 扩散模型
537539

docs/source/GetStarted/使用tuners.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ tuner是指附加在模型上的额外结构部分,用于减少训练参数量
1414
10. Adapter: [Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751)
1515
11. Vision Prompt Tuning: [Visual Prompt Tuning](https://arxiv.org/abs/2203.12119)
1616
12. Side: [Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks](https://arxiv.org/abs/1912.13503)
17-
13. Res-Tuning: [Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone](https://arxiv.org/abs/2310.19859) < [arXiv](https://arxiv.org/abs/2310.19859) | [Project Page](https://res-tuning.github.io/) | [Usage](docs/source/GetStarted/ResTuning.md) >
17+
13. Res-Tuning: [Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone](https://arxiv.org/abs/2310.19859) < [arXiv](https://arxiv.org/abs/2310.19859) | [Project Page](https://res-tuning.github.io/) | [Usage](ResTuning.md) >
1818
14. [PEFT](https://github.com/huggingface/peft)提供的tuners, 如IA3, AdaLoRA等
1919

2020
## 在训练中使用

docs/source/LLM/支持的模型和数据集.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -272,7 +272,10 @@
272272
|codefuse-qwen-14b-chat|[codefuse-ai/CodeFuse-QWen-14B](https://modelscope.cn/models/codefuse-ai/CodeFuse-QWen-14B/summary)|c_attn|codefuse|&#x2714;|&#x2714;||coding|[codefuse-ai/CodeFuse-QWen-14B](https://huggingface.co/codefuse-ai/CodeFuse-QWen-14B)|
273273
|phi2-3b|[AI-ModelScope/phi-2](https://modelscope.cn/models/AI-ModelScope/phi-2/summary)|Wqkv|default-generation|&#x2714;|&#x2714;||coding|[microsoft/phi-2](https://huggingface.co/microsoft/phi-2)|
274274
|phi3-4b-4k-instruct|[LLM-Research/Phi-3-mini-4k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-mini-4k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2718;|transformers>=4.36|general|[microsoft/Phi-3-mini-4k-instruct](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)|
275-
|phi3-4b-128k-instruct|[LLM-Research/Phi-3-mini-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-mini-128k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2718;|transformers>=4.36|general|[microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)|
275+
|phi3-4b-128k-instruct|[LLM-Research/Phi-3-mini-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-mini-128k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2714;|transformers>=4.36|general|[microsoft/Phi-3-mini-128k-instruct](https://huggingface.co/microsoft/Phi-3-mini-128k-instruct)|
276+
|phi3-small-128k-instruct|[LLM-Research/Phi-3-small-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-small-128k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2714;|transformers>=4.36|general|[microsoft/Phi-3-small-128k-instruct](https://huggingface.co/microsoft/Phi-3-small-128k-instruct)|
277+
|phi3-medium-128k-instruct|[LLM-Research/Phi-3-medium-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-medium-128k-instruct/summary)|qkv_proj|phi3|&#x2714;|&#x2714;|transformers>=4.36|general|[microsoft/Phi-3-medium-128k-instruct](https://huggingface.co/microsoft/Phi-3-medium-128k-instruct)|
278+
|phi3-vision-128k-instruct|[LLM-Research/Phi-3-vision-128k-instruct](https://modelscope.cn/models/LLM-Research/Phi-3-vision-128k-instruct/summary)|qkv_proj|phi3-vl|&#x2714;|&#x2718;|transformers>=4.36|multi-modal, vision|[microsoft/Phi-3-vision-128k-instruct](https://huggingface.co/microsoft/Phi-3-vision-128k-instruct)|
276279
|cogvlm-17b-chat|[ZhipuAI/cogvlm-chat](https://modelscope.cn/models/ZhipuAI/cogvlm-chat/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense|cogvlm|&#x2718;|&#x2718;||multi-modal, vision|[THUDM/cogvlm-chat-hf](https://huggingface.co/THUDM/cogvlm-chat-hf)|
277280
|cogvlm2-19b-chat|[ZhipuAI/cogvlm2-llama3-chinese-chat-19B](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chinese-chat-19B/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense|cogvlm|&#x2718;|&#x2718;||-|[THUDM/cogvlm2-llama3-chinese-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chinese-chat-19B)|
278281
|cogvlm2-en-19b-chat|[ZhipuAI/cogvlm2-llama3-chat-19B](https://modelscope.cn/models/ZhipuAI/cogvlm2-llama3-chat-19B/summary)|vision_expert_query_key_value, vision_expert_dense, language_expert_query_key_value, language_expert_dense|cogvlm|&#x2718;|&#x2718;||-|[THUDM/cogvlm2-llama3-chat-19B](https://huggingface.co/THUDM/cogvlm2-llama3-chat-19B)|

docs/source/Multi-Modal/cogvlm2最佳实践.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -114,14 +114,14 @@ seed_everything(42)
114114

115115
images = ['http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png']
116116
query = '距离各城市多远?'
117-
response, _ = inference(model, template, query, images=images)
117+
response, history = inference(model, template, query, images=images)
118118
print(f'query: {query}')
119119
print(f'response: {response}')
120120

121121
# 流式
122122
query = '距离最远的城市是哪?'
123123
images = images
124-
gen = inference_stream(model, template, query, images=images)
124+
gen = inference_stream(model, template, query, history, images=images)
125125
print_idx = 0
126126
print(f'query: {query}\nresponse: ', end='')
127127
for response, _ in gen:
@@ -134,7 +134,7 @@ print()
134134
query: 距离各城市多远?
135135
response: 距离马踏Mata有14km,距离阳江Yangjiang有62km,距离广州Guangzhou有293km。
136136
query: 距离最远的城市是哪?
137-
response: 距离最远的城市是广州Guangzhou。
137+
response: 距离最远的城市是广州Guangzhou,有293km
138138
"""
139139
```
140140

docs/source/Multi-Modal/index.md

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -2,13 +2,21 @@
22

33
### Multi-Modal最佳实践系列
44

5+
一轮对话可以包含多张图片(或不含图片):
56
1. [Qwen-VL最佳实践](qwen-vl最佳实践.md)
67
2. [Qwen-Audio最佳实践](qwen-audio最佳实践.md)
7-
3. [Llava最佳实践](llava最佳实践.md)
8-
4. [Deepseek-VL最佳实践](deepseek-vl最佳实践.md)
9-
5. [Yi-VL最佳实践.md](yi-vl最佳实践.md)
10-
6. [Internlm2-Xcomposers最佳实践](internlm-xcomposer2最佳实践.md)
11-
7. [MiniCPM-V最佳实践](minicpm-v最佳实践.md), [MiniCPM-V-2最佳实践](minicpm-v-2最佳实践.md), [MiniCPM-V-2.5最佳实践](minicpm-v-2.5最佳实践.md)
12-
8. [CogVLM最佳实践](cogvlm最佳实践.md), [CogVLM2最佳实践](cogvlm2最佳实践.md)
13-
9. [mPLUG-Owl2最佳实践](mplug-owl2最佳实践.md)
14-
10. [InternVL-Chat-V1.5最佳实践](internvl最佳实践.md)
8+
3. [Deepseek-VL最佳实践](deepseek-vl最佳实践.md)
9+
4. [Internlm2-Xcomposers最佳实践](internlm-xcomposer2最佳实践.md)
10+
5. [Phi3-Vision最佳实践](phi3-vision最佳实践.md)
11+
12+
13+
一轮对话只能包含一张图片:
14+
1. [Llava最佳实践](llava最佳实践.md)
15+
2. [Yi-VL最佳实践.md](yi-vl最佳实践.md)
16+
3. [mPLUG-Owl2最佳实践](mplug-owl2最佳实践.md)
17+
4. [InternVL-Chat-V1.5最佳实践](internvl最佳实践.md)
18+
19+
20+
整个对话围绕一张图片:
21+
1. [CogVLM最佳实践](cogvlm最佳实践.md), [CogVLM2最佳实践](cogvlm2最佳实践.md)
22+
2. [MiniCPM-V最佳实践](minicpm-v最佳实践.md), [MiniCPM-V-2最佳实践](minicpm-v-2最佳实践.md), [MiniCPM-V-2.5最佳实践](minicpm-v-2.5最佳实践.md)

docs/source/Multi-Modal/internlm-xcomposer2最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ print()
109109
print(f'history: {history}')
110110
"""
111111
query: <img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?
112-
response: 马鞍山距离阳江62公里,广州距离广州293公里。
112+
response: 马鞍山距离阳江62公里,广州距离广州293公里。
113113
query: 距离最远的城市是哪?
114114
response: 距离最最远的城市是广州,距离广州293公里。
115115
history: [['<img>http://modelscope-open.oss-cn-hangzhou.aliyuncs.com/images/road.png</img>距离各城市多远?', ' 马鞍山距离阳江62公里,广州距离广州293公里。'], ['距离最远的城市是哪?', ' 距离最远的城市是广州,距离广州293公里。']]

0 commit comments

Comments
 (0)