support qwen2.5-coder (#2061)

Jintao-Huang · web-flow · commit 594573a1aba0 · 2024-09-18T20:51:36.000+08:00
diff --git a/README.md b/README.md
@@ -36,7 +36,7 @@
 - [Citation](#-citation)
 
 ## 📝 Introduction
-SWIFT supports training(PreTraining/Fine-tuning/RLHF), inference, evaluation and deployment of **300+ LLMs and 80+ MLLMs** (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by [PEFT](https://github.com/huggingface/peft), we also provide a complete **Adapters library** to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.
+SWIFT supports training(PreTraining/Fine-tuning/RLHF), inference, evaluation and deployment of **350+ LLMs and 90+ MLLMs** (multimodal large models). Developers can directly apply our framework to their own research and production environments to realize the complete workflow from model training and evaluation to application. In addition to supporting the lightweight training solutions provided by [PEFT](https://github.com/huggingface/peft), we also provide a complete **Adapters library** to support the latest training techniques such as NEFTune, LoRA+, LLaMA-PRO, etc. This adapter library can be used directly in your own custom workflow without our training scripts.
 
 To facilitate use by users unfamiliar with deep learning, we provide a Gradio web-ui for controlling training and inference, as well as accompanying deep learning courses and best practices for beginners. SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
 
@@ -55,6 +55,7 @@ You can contact us and communicate with us by adding our group:
 <img src="asset/discord_qr.jpg" width="200" height="200">  |  <img src="asset/wechat.png" width="200" height="200">
 
 ## 🎉 News
+- 2024.09.18: Supports the qwen2.5, qwen2.5-math, and qwen2.5-coder series models. Supports the qwen2-vl-72b series models.
 - 2024.09.07: Support the `Reflection-llama3-70b` model, use by `swift sft/infer --model_type reflection-llama_3_1-70b`.
 - 2024.09.06: Support fine-tuning and inference for mplug-owl3. Best practices can be found [here](https://github.com/modelscope/ms-swift/issues/1969).
 - 2024.09.05: Support for the minicpm3-4b model. Experience it using `swift infer --model_type minicpm3-4b`.
diff --git a/README_CN.md b/README_CN.md
@@ -37,7 +37,7 @@
 - [引用](#-引用)
 
 ## 📝 简介
-SWIFT支持**300+ LLM和80+ MLLM**（多模态大模型）的训练(预训练、微调、对齐)、推理、评测和部署。开发者可以直接将我们的框架应用到自己的Research和生产环境中，实现模型训练评测到应用的完整链路。我们除支持了[PEFT](https://github.com/huggingface/peft)提供的轻量训练方案外，也提供了一个完整的**Adapters库**以支持最新的训练技术，如NEFTune、LoRA+、LLaMA-PRO等，这个适配器库可以脱离训练脚本直接使用在自己的自定流程中。
+SWIFT支持**350+ LLM和90+ MLLM**（多模态大模型）的训练(预训练、微调、对齐)、推理、评测和部署。开发者可以直接将我们的框架应用到自己的Research和生产环境中，实现模型训练评测到应用的完整链路。我们除支持了[PEFT](https://github.com/huggingface/peft)提供的轻量训练方案外，也提供了一个完整的**Adapters库**以支持最新的训练技术，如NEFTune、LoRA+、LLaMA-PRO等，这个适配器库可以脱离训练脚本直接使用在自己的自定流程中。
 
 为方便不熟悉深度学习的用户使用，我们提供了一个Gradio的web-ui用于控制训练和推理，并提供了配套的深度学习课程和最佳实践供新手入门。 可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) 和 [ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。
 
@@ -56,6 +56,7 @@ SWIFT具有丰富全面的文档，请查看我们的文档网站:
 
 
 ## 🎉 新闻
+- 2024.09.18: 支持qwen2.5、qwen2.5-math、qwen2.5-coder系列模型. 支持qwen2-vl-72b系列模型.
 - 2024.09.07: 支持`Reflection-llama3-70b`模型， 使用`swift sft/infer --model_type reflection-llama_3_1-70b`命令即可训练和推理.
 - 2024.09.06: 支持mplug-owl3的微调和推理, 最佳实践可以查看[这里](https://github.com/modelscope/ms-swift/issues/1969).
 - 2024.09.05: 支持minicpm3-4b模型. 使用`swift infer --model_type minicpm3-4b`进行体验.
diff --git a/docs/source/Instruction/支持的模型和数据集.md b/docs/source/Instruction/支持的模型和数据集.md
@@ -150,6 +150,8 @@
 |qwen2_5-math-1_5b-instruct|[qwen/Qwen2.5-Math-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct)|
 |qwen2_5-math-7b-instruct|[qwen/Qwen2.5-Math-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)|
 |qwen2_5-math-72b-instruct|[qwen/Qwen2.5-Math-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Math-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-72B-Instruct)|
+|qwen2_5-coder-1_5b|[qwen/Qwen2.5-Coder-1.5B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B)|
+|qwen2_5-coder-1_5b-instruct|[qwen/Qwen2.5-Coder-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct)|
 |qwen2_5-coder-7b|[qwen/Qwen2.5-Coder-7B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B)|
 |qwen2_5-coder-7b-instruct|[qwen/Qwen2.5-Coder-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)|
 |chatglm2-6b|[ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)|
diff --git a/docs/source_en/Instruction/Supported-models-datasets.md b/docs/source_en/Instruction/Supported-models-datasets.md
@@ -150,6 +150,8 @@ The table below introcudes all models supported by SWIFT:
 |qwen2_5-math-1_5b-instruct|[qwen/Qwen2.5-Math-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Math-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-1.5B-Instruct)|
 |qwen2_5-math-7b-instruct|[qwen/Qwen2.5-Math-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Math-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-7B-Instruct)|
 |qwen2_5-math-72b-instruct|[qwen/Qwen2.5-Math-72B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Math-72B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Math-72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Math-72B-Instruct)|
+|qwen2_5-coder-1_5b|[qwen/Qwen2.5-Coder-1.5B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-1.5B](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B)|
+|qwen2_5-coder-1_5b-instruct|[qwen/Qwen2.5-Coder-1.5B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-1.5B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-1.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-1.5B-Instruct)|
 |qwen2_5-coder-7b|[qwen/Qwen2.5-Coder-7B](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B/summary)|q_proj, k_proj, v_proj|default-generation|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-7B](https://huggingface.co/Qwen/Qwen2.5-Coder-7B)|
 |qwen2_5-coder-7b-instruct|[qwen/Qwen2.5-Coder-7B-Instruct](https://modelscope.cn/models/qwen/Qwen2.5-Coder-7B-Instruct/summary)|q_proj, k_proj, v_proj|qwen|&#x2714;|&#x2714;|&#x2714;|&#x2718;|transformers>=4.37|-|[Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)|
 |chatglm2-6b|[ZhipuAI/chatglm2-6b](https://modelscope.cn/models/ZhipuAI/chatglm2-6b/summary)|query_key_value|chatglm2|&#x2718;|&#x2714;|&#x2718;|&#x2718;|transformers<4.42|-|[THUDM/chatglm2-6b](https://huggingface.co/THUDM/chatglm2-6b)|
diff --git a/swift/llm/utils/model.py b/swift/llm/utils/model.py
@@ -183,6 +183,8 @@ class ModelType:
     qwen2_5_math_7b_instruct = 'qwen2_5-math-7b-instruct'
     qwen2_5_math_72b_instruct = 'qwen2_5-math-72b-instruct'
     # qwen2.5 coder
+    qwen2_5_coder_1_5b = 'qwen2_5-coder-1_5b'
+    qwen2_5_coder_1_5b_instruct = 'qwen2_5-coder-1_5b-instruct'
     qwen2_5_coder_7b = 'qwen2_5-coder-7b'
     qwen2_5_coder_7b_instruct = 'qwen2_5-coder-7b-instruct'
     # qwen-vl
@@ -3526,28 +3528,30 @@ def get_model_tokenizer_qwen2_chat(model_dir: str,
         requires=['transformers>=4.37'],
         hf_model_id=f'Qwen/Qwen2.5-Math-{model_size}-Instruct')
 
-register_model(
-    'qwen2_5-coder-7b',
-    'qwen/Qwen2.5-Coder-7B',
-    LoRATM.llama,
-    TemplateType.default_generation,
-    get_model_tokenizer_with_flash_attn,
-    support_flash_attn=True,
-    support_vllm=True,
-    support_lmdeploy=True,
-    requires=['transformers>=4.37'],
-    hf_model_id='Qwen/Qwen2.5-Coder-7B')
-register_model(
-    'qwen2_5-coder-7b-instruct',
-    'qwen/Qwen2.5-Coder-7B-Instruct',
-    LoRATM.llama,
-    TemplateType.qwen,
-    get_model_tokenizer_qwen2_chat,
-    support_flash_attn=True,
-    support_vllm=True,
-    support_lmdeploy=True,
-    requires=['transformers>=4.37'],
-    hf_model_id='Qwen/Qwen2.5-Coder-7B-Instruct')
+for model_size in ['1.5B', '7B']:
+    model_size_lower = model_size.lower().replace('.', '_')
+    register_model(
+        f'qwen2_5-coder-{model_size_lower}',
+        f'qwen/Qwen2.5-Coder-{model_size}',
+        LoRATM.llama,
+        TemplateType.default_generation,
+        get_model_tokenizer_with_flash_attn,
+        support_flash_attn=True,
+        support_vllm=True,
+        support_lmdeploy=True,
+        requires=['transformers>=4.37'],
+        hf_model_id=f'Qwen/Qwen2.5-Coder-{model_size}')
+    register_model(
+        f'qwen2_5-coder-{model_size_lower}-instruct',
+        f'qwen/Qwen2.5-Coder-{model_size}-Instruct',
+        LoRATM.llama,
+        TemplateType.qwen,
+        get_model_tokenizer_qwen2_chat,
+        support_flash_attn=True,
+        support_vllm=True,
+        support_lmdeploy=True,
+        requires=['transformers>=4.37'],
+        hf_model_id=f'Qwen/Qwen2.5-Coder-{model_size}-Instruct')
 
 
 @register_model(