Skip to content

Commit a5e7f8d

Browse files
committed
Support qwen2 (#1017)
1 parent 434990f commit a5e7f8d

16 files changed

+498
-132
lines changed

README.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
4747
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
4848

4949
## 🎉 News
50+
- 🔥2024.06.07: Support Qwen2 series LLM, including Base and Instruct models of 0.5B, 1.5B, 7B, and 72B, as well as corresponding quantized versions gptq-int4, gptq-int8, and awq-int4.
5051
- 🔥2024.06.05: Support for **glm4** series LLM and glm4v-9b-chat MLLM. You can refer to [glm4v best practice](docs/source_en/Multi-Modal/glm4v-best-practice.md).
5152
- 🔥2024.06.01: Supoprts **SimPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/SimPO.md) to start training!
5253
- 🔥2024.06.01: Support for deploying large multimodal models, please refer to the [Multimodal Deployment Documentation](docs/source_en/Multi-Modal/mutlimodal-deployment.md) for more information.
@@ -486,7 +487,7 @@ The complete list of supported models and datasets can be found at [Supported Mo
486487

487488
| Model Type | Model Introduction | Language | Model Size | Model Type |
488489
|------------------------------------------------|------------------------------------------------------------------------|--------------------|----------------------------------------|------------------------------------------- |
489-
| Qwen<br>Qwen1.5 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-110B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
490+
| Qwen<br>Qwen1.5<br>Qwen2 | [Tongyi Qwen 1.0 and 1.5 series models](https://github.com/QwenLM) | Chinese<br>English | 0.5B-110B<br>including quantized versions | base model<br>chat model<br>MoE model<br>code model |
490491
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 | [Zhipu ChatGLM series models](https://github.com/THUDM) | Chinese<br>English | 6B-9B | base model<br>chat model<br>code model<br>long text model |
491492
| Baichuan/Baichuan2 | [Baichuan 1 and Baichuan 2](https://github.com/baichuan-inc) | Chinese<br>English | 7B-13B<br>including quantized versions | base model<br>chat model |
492493
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |

README_CN.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
4848
可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift)[ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。
4949

5050
## 🎉 新闻
51+
- 🔥2024.06.07: 支持Qwen2系列LLM, 包括0.5B、1.5B、7B、72B的Base和Instruct模型, 以及对应的gptq-int4、gptq-int8、awq-int4量化版本.
5152
- 🔥2024.06.05: 支持glm4系列大模型和glm4v-9b-chat多模态大模型, 可以查看[glm4v最佳实践](docs/source/Multi-Modal/glm4v最佳实践.md).
5253
- 🔥2024.06.01: 支持**SimPO**训练,使用`swift simpo`来开始训练,最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/SimPO算法最佳实践.md)
5354
- 🔥2024.06.01: 支持多模态大模型部署, 可以查看[多模态部署文档](docs/source/Multi-Modal/MLLM部署文档.md).
@@ -482,7 +483,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
482483

483484
| 模型类型 | 模型介绍 | 语言 | 模型大小 | 模型类型 |
484485
| --------------------------------------------------- | ------------------------------------------------------------ |----------| ------------------------- |-------------------------------------------|
485-
| Qwen<br>Qwen1.5 | [通义千问1.0和1.5系列模型](https://github.com/QwenLM) | 中文<br>英文 | 0.5B-110B<br>包含量化版本 | base模型<br>chat模型<br>MoE模型<br>代码模型 | |
486+
| Qwen<br>Qwen1.5<br>Qwen2 | [通义千问1.0和1.5系列模型](https://github.com/QwenLM) | 中文<br>英文 | 0.5B-110B<br>包含量化版本 | base模型<br>chat模型<br>MoE模型<br>代码模型 | |
486487
| ChatGLM2<br>ChatGLM3<br>Codegeex2<br>GLM4 | [智谱ChatGLM系列模型](https://github.com/THUDM/) | 中文<br>英文 | 6B-9B | base模型<br>chat模型<br>代码模型<br>长文本模型 |
487488
| Baichuan<br>Baichuan2 | [百川1和百川2](https://github.com/baichuan-inc) | 中文<br>英文 | 7B-13B<br>包含量化版本 | base模型<br>chat模型 |
488489
| Yuan2 | [浪潮源系列模型](https://github.com/IEIT-Yuan) | 中文<br>英文 | 2B-102B | instruct模型 |

docs/source/LLM/LLM量化文档.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -68,16 +68,16 @@ pip install -r requirements/llm.txt -U
6868
# 如果出现量化的时候OOM, 可以适度降低`--quant_n_samples`(默认256)和`--quant_seqlen`(默认2048).
6969
# gptq-int4量化 (使用A100大约需要20分钟, 显存占用: 7GB)
7070

71-
# awq: 使用`alpaca-zh alpaca-en sharegpt-gpt4-mini`作为量化数据集
71+
# awq: 使用`alpaca-zh alpaca-en sharegpt-gpt4:default`作为量化数据集
7272
CUDA_VISIBLE_DEVICES=0 swift export \
7373
--model_type qwen1half-7b-chat --quant_bits 4 \
74-
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method awq
74+
--dataset alpaca-zh alpaca-en sharegpt-gpt4:default --quant_method awq
7575

76-
# gptq: 使用`alpaca-zh alpaca-en sharegpt-gpt4-mini`作为量化数据集
76+
# gptq: 使用`alpaca-zh alpaca-en sharegpt-gpt4:default`作为量化数据集
7777
# gptq量化请先查看此issue: https://github.com/AutoGPTQ/AutoGPTQ/issues/439
7878
OMP_NUM_THREADS=14 CUDA_VISIBLE_DEVICES=0 swift export \
7979
--model_type qwen1half-7b-chat --quant_bits 4 \
80-
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method gptq
80+
--dataset alpaca-zh alpaca-en sharegpt-gpt4:default --quant_method gptq
8181

8282
# awq: 使用自定义量化数据集
8383
# gptq同理
@@ -216,11 +216,11 @@ CUDA_VISIBLE_DEVICES=0 swift infer \
216216

217217
**Merge-LoRA & 量化**
218218
```shell
219-
# 使用`alpaca-zh alpaca-en sharegpt-gpt4-mini`作为量化数据集
219+
# 使用`alpaca-zh alpaca-en sharegpt-gpt4:default`作为量化数据集
220220
CUDA_VISIBLE_DEVICES=0 swift export \
221221
--ckpt_dir 'output/qwen1half-4b-chat/vx-xxx/checkpoint-xxx' \
222222
--merge_lora true --quant_bits 4 \
223-
--dataset alpaca-zh alpaca-en sharegpt-gpt4-mini --quant_method awq
223+
--dataset alpaca-zh alpaca-en sharegpt-gpt4:default --quant_method awq
224224

225225
# 使用微调时使用的数据集作为量化数据集
226226
CUDA_VISIBLE_DEVICES=0 swift export \

docs/source/LLM/VLLM推理加速与部署.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -527,7 +527,7 @@ CUDA_VISIBLE_DEVICES=0,1,2,3 \
527527
NPROC_PER_NODE=4 \
528528
swift sft \
529529
--model_type llama2-7b-chat \
530-
--dataset self-cognition#500 sharegpt-gpt4-mini#1000 \
530+
--dataset self-cognition#500 sharegpt-gpt4:default#1000 \
531531
--logging_steps 5 \
532532
--max_length 4096 \
533533
--learning_rate 5e-5 \

docs/source/LLM/命令行参数.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@
9090
- `--save_only_model`: 是否只保存模型参数, 而不存储断点续训所需的中间状态, 默认为`None`, 即如果`sft_type`为'lora'并且不使用deepspeed(`deepspeed``None`), 设置为False, 否则设置为True(e.g. 使用了全参数微调或者使用了deepspeed).
9191
- `--save_total_limit`: 保存的checkpoint的数量, 默认为`2`, 即保存best和last的checkpoint. 如果设置为-1, 则保存所有的checkpoint.
9292
- `--logging_steps`: 每训练多少步打印训练信息(e.g. loss, learning_rate等), 默认为`5`.
93-
- `--dataloader_num_workers`: 默认值为`1`.
93+
- `--dataloader_num_workers`: 默认值为`None`, 如果是windows机器, 则设置为`0`, 否则设置为`1`.
9494
- `--push_to_hub`: 是否将训练的checkpoint同步推送到ModelScope Hub中, 默认为`False`.
9595
- `--hub_model_id`: 推送到的ModelScope Hub的model_id, 默认为`None`, 即设置为`f'{model_type}-{sft_type}'`. 你可以将其设置为model_id, 也可以设置为repo_name. 我们会根据hub_token推断出user_name. 推送的远程仓库如果不存在, 则会创建一个新的仓库, 如果存在, 则复用之前的仓库. 该参数只有在`push_to_hub`设置为True时才生效.
9696
- `--hub_token`: 推送时需要的SDK token. 可以从[https://modelscope.cn/my/myaccesstoken](https://modelscope.cn/my/myaccesstoken)获取, 默认为`None`, 即从环境变量`MODELSCOPE_API_TOKEN`中获取. 该参数只有在`push_to_hub`设置为True时才生效.

0 commit comments

Comments
 (0)