Skip to content

Commit 361d773

Browse files
committed
Merge commit '94349d3e1caf90257ee97388aa871a89d947acdf' into release/2.0
* commit '94349d3e1caf90257ee97388aa871a89d947acdf': Fix custom dataset (#736) fix zsh install (#735) Submit lossing files (#727) update link error (#733) [Feat] Compatible with Hugging Face & update more models (#712) support roleplay dataset (#728) update (#719) Support eval on openai interfaces (#723) Fix loss scale (#720)
2 parents d1376a6 + 94349d3 commit 361d773

File tree

76 files changed

+2246
-1478
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+2246
-1478
lines changed

.github/PULL_REQUEST_TEMPLATE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
- [ ] Bug Fix
33
- [ ] New Feature
44
- [ ] Document Updates
5-
- [ ] More Model or Dataset Support
5+
- [ ] More Models or Datasets Support
66

77
# PR information
88

README.md

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,9 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
3939
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
4040

4141
## 🎉 News
42+
- 2024.04.18: Supported models: wizardlm2-7b-awq, wizardlm2-8x22b, yi-6b-chat-awq, yi-6b-chat-int8, yi-34b-chat-awq, yi-34b-chat-int8. Supported `--deepspeed zero3-offload` and provided default zero3-offload configuration file for zero3+cpu offload usage.
43+
- 2024.04.18: Supported compatibility with HuggingFace ecosystem using the environment variable `USE_HF`, switching to use models and datasets from HF. Please refer to the [HuggingFace ecosystem compatibility documentation](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Compat-HF.md).
44+
- 2024.04.17: Support the evaluation for OpenAI standard interfaces. Check the [parameter documentation](docs/source_en/LLM/Command-line-parameters.md#eval-parameters) for details.
4245
- 🔥2024.04.17: Support **CodeQwen1.5-7B** series: CodeQwen1.5-7B, CodeQwen1.5-7B-Chat,CodeQwen1.5-7B-Chat-AWQ, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh) to train.
4346
- 2024.04.16: Supports inference and fine-tuning of llava-v1.6-34b model. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
4447
- 2024.04.13: Support the fine-tuning and inference of Mixtral-8x22B-v0.1 model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh) to start training!
@@ -137,11 +140,11 @@ SWIFT runs in the Python environment. Please ensure your Python version is highe
137140

138141
```shell
139142
# Full capabilities
140-
pip install ms-swift[all] -U
143+
pip install 'ms-swift[all]' -U
141144
# LLM only
142-
pip install ms-swift[llm] -U
145+
pip install 'ms-swift[llm]' -U
143146
# AIGC only
144-
pip install ms-swift[aigc] -U
147+
pip install 'ms-swift[aigc]' -U
145148
# Adapters only
146149
pip install ms-swift -U
147150
```
@@ -151,7 +154,7 @@ pip install ms-swift -U
151154
```shell
152155
git clone https://github.com/modelscope/swift.git
153156
cd swift
154-
pip install -e .[llm]
157+
pip install -e '.[llm]'
155158
```
156159

157160
SWIFT depends on torch>=1.13, recommend torch>=2.0.0.
@@ -317,6 +320,23 @@ swift sft \
317320
--deepspeed default-zero3 \
318321
```
319322

323+
ZeRO3-Offload:
324+
```shell
325+
# Experimental Environment: 4 * A100
326+
# GPU Memory Requirement: 4 * 12GB
327+
# Runtime: 60 hours
328+
NPROC_PER_NODE=4 \
329+
CUDA_VISIBLE_DEVICES=0,1,2,3 \
330+
swift sft \
331+
--model_id_or_path AI-ModelScope/WizardLM-2-8x22B \
332+
--dataset blossom-math-zh \
333+
--num_train_epochs 5 \
334+
--sft_type lora \
335+
--output_dir output \
336+
--deepspeed zero3-offload \
337+
```
338+
339+
320340
### Inference
321341
Original model:
322342
```shell
@@ -389,7 +409,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
389409
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
390410
| LLaMA2 | [LLaMA2 series models](https://github.com/facebookresearch/llama) | English | 7B-70B<br>including quantized versions | base model<br>chat model |
391411
| Mistral<br>Mixtral | [Mistral series models](https://github.com/mistralai/mistral-src) | English | 7B-22B | base model<br>instruct model<br>MoE model |
392-
| YI | [01AI's YI series models](https://github.com/01-ai) | Chinese<br>English | 6B-34B | base model<br>chat model<br>long text model |
412+
| YI | [01AI's YI series models](https://github.com/01-ai) | Chinese<br>English | 6B-34B<br>including quantized | base model<br>chat model<br>long text model |
393413
| InternLM<br>InternLM2<br>InternLM2-Math | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
394414
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math | [DeepSeek series models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-67B | base model<br>chat model<br>MoE model<br>code model<br>math model |
395415
| MAMBA | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English | 130M-2.8B | base model |
@@ -412,7 +432,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
412432
| dbrx | [databricks](https://github.com/databricks/dbrx) | English | 132B | base model<br>chat model |
413433
| mengzi3 | [Langboat](https://github.com/Langboat/Mengzi3) | Chinese<br>English | 13B | base model |
414434
| c4ai-command-r | [c4ai](https://cohere.com/command) | Multilingual | 35B-104B | chat model |
415-
435+
| WizardLM2 | [WizardLM2 series models](https://github.com/nlpxucan/WizardLM) | English | 7B-8x22B<br>including quantized versions | chat model<br>MoE model |
416436

417437
#### MLLMs
418438

README_CN.md

Lines changed: 26 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4040
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。
4141

4242
## 🎉 新闻
43+
- 2024.04.18: 支持模型: wizardlm2-7b-awq, wizardlm2-8x22b, yi-6b-chat-awq, yi-6b-chat-int8, yi-34b-chat-awq, yi-34b-chat-int8. 支持`--deepspeed zero3-offload`, 提供了默认zero3-offload配置文件来使用zero3+cpu offload.
44+
- 2024.04.18: 支持使用环境变量`USE_HF`兼容HuggingFace生态, 切换成使用HF中的模型和数据集, 可以查看[HuggingFace生态兼容文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/HuggingFace生态兼容.md).
45+
- 2024.04.17: 支持OpenAI样式的接口评测, 可以查看[评测参数接口文档](docs/source/LLM/命令行参数.md#eval参数)来查看使用方法.
4346
- 🔥2024.04.17: 支持 **CodeQwen1.5-7B**系列: CodeQwen1.5-7B, CodeQwen1.5-7B-Chat,CodeQwen1.5-7B-Chat-AWQ, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/codeqwen1half_7b_chat/lora/sft.sh)来开始训练!
4447
- 2024.04.16: 支持llava-v1.6-34b的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
4548
- 2024.04.13: 支持Mixtral-8x22B-v0.1模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh)来开始训练!
@@ -54,7 +57,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
5457
- 🔥2024.04.02: 支持Mengzi3-13B-Base模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mengzi3_13b_base/lora_ddp_ds/sft.sh)来开始训练!
5558
- 🔥2024.04.01: 支持**dbrx**系列, dbrx-base和dbrx-instruct, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/dbrx-instruct/lora_mp/sft.sh)来开始训练!.
5659
- 🔥2024.03.29: 支持**Qwen1.5-MoE**系列: Qwen1.5-MoE-A2.7B, Qwen1.5-MoE-A2.7B-Chat, Qwen1.5-MoE-A2.7B-Chat-GPTQ-Int4.
57-
- 🔥2024.03.29: 支持**Grok-1**300B MoE模型的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Grok训练和推理.md).
60+
- 🔥2024.03.29: 支持**Grok-1** 300B MoE模型的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Grok训练和推理.md).
5861
- 🔥2024.03.25: 支持TeleChat-7b和TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练!.
5962
- 🔥2024.03.20: 支持**llava**系列的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
6063
- 🔥2024.03.12: 支持**deepseek-vl**系列推理和微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
@@ -138,11 +141,11 @@ SWIFT在Python环境中运行。请确保您的Python版本高于3.8。
138141

139142
```shell
140143
# 全量能力
141-
pip install ms-swift[all] -U
144+
pip install 'ms-swift[all]' -U
142145
# 仅使用LLM
143-
pip install ms-swift[llm] -U
146+
pip install 'ms-swift[llm]' -U
144147
# 仅使用AIGC
145-
pip install ms-swift[aigc] -U
148+
pip install 'ms-swift[aigc]' -U
146149
# 仅使用Adapters
147150
pip install ms-swift -U
148151
```
@@ -152,7 +155,7 @@ pip install ms-swift -U
152155
```shell
153156
git clone https://github.com/modelscope/swift.git
154157
cd swift
155-
pip install -e .[llm]
158+
pip install -e '.[llm]'
156159
```
157160

158161
SWIFT依赖torch>=1.13,建议torch>=2.0.0。
@@ -315,6 +318,22 @@ swift sft \
315318
--deepspeed default-zero3 \
316319
```
317320

321+
ZeRO3-Offload:
322+
```shell
323+
# 实验环境: 4 * A100
324+
# 显存需求: 4 * 12GB
325+
# 运行时长: 60小时
326+
NPROC_PER_NODE=4 \
327+
CUDA_VISIBLE_DEVICES=0,1,2,3 \
328+
swift sft \
329+
--model_id_or_path AI-ModelScope/WizardLM-2-8x22B \
330+
--dataset blossom-math-zh \
331+
--num_train_epochs 5 \
332+
--sft_type lora \
333+
--output_dir output \
334+
--deepspeed zero3-offload \
335+
```
336+
318337
### 推理
319338
原始模型:
320339
```shell
@@ -387,7 +406,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
387406
| XVerse | [元象系列模型](https://github.com/xverse-ai) | 中文<br>英文 | 7B-65B | base模型<br>chat模型<br>长文本模型<br>MoE模型 | |
388407
| LLaMA2 | [LLaMA2系列模型](https://github.com/facebookresearch/llama) | 英文 | 7B-70B<br>包含量化版本 | base模型<br>chat模型 |
389408
| Mistral<br>Mixtral | [Mistral系列模型](https://github.com/mistralai/mistral-src) | 英文 | 7B-8x22B | base模型<br>instruct模型<br>MoE模型 |
390-
| YI | [01AI的YI系列模型](https://github.com/01-ai) | 中文<br>英文 | 6B-34B | base模型<br>chat模型<br>长文本模型 |
409+
| YI | [01AI的YI系列模型](https://github.com/01-ai) | 中文<br>英文 | 6B-34B<br>包含量化版本 | base模型<br>chat模型<br>长文本模型 |
391410
| InternLM<br>InternLM2<br>InternLM2-Math | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 1.8B-20B | base模型<br>chat模型<br>数学模型 |
392411
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math | [幻方系列模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-67B | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
393412
| MAMBA | [MAMBA时序卷积模型](https://github.com/state-spaces/mamba) | 英文 | 130M-2.8B | base模型 |
@@ -410,6 +429,7 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
410429
| dbrx | [databricks](https://github.com/databricks/dbrx) | 英文 | 132B | base模型<br>chat模型 |
411430
| mengzi3 | [Langboat](https://github.com/Langboat/Mengzi3) | 中文<br>英文 | 13B | base模型 |
412431
| c4ai-command-r | [c4ai](https://cohere.com/command) | 多语种 | 35B-104B | chat模型 |
432+
| WizardLM2 | [WizardLM2系列模型](https://github.com/nlpxucan/WizardLM) | 多语种 | 7B-8x22B<br>包含量化版本 | chat模型<br>MoE模型 |
413433

414434

415435
#### 多模态大模型

docs/source/GetStarted/SWIFT安装.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,11 @@
66

77
```shell
88
# 全量能力
9-
pip install ms-swift[all] -U
9+
pip install 'ms-swift[all]' -U
1010
# 仅使用LLM
11-
pip install ms-swift[llm] -U
11+
pip install 'ms-swift[llm]' -U
1212
# 仅使用AIGC
13-
pip install ms-swift[aigc] -U
13+
pip install 'ms-swift[aigc]' -U
1414
# 仅使用adapters
1515
pip install ms-swift -U
1616
```
@@ -20,7 +20,7 @@ pip install ms-swift -U
2020
```shell
2121
git clone https://github.com/modelscope/swift.git
2222
cd swift
23-
pip install -e .[all]
23+
pip install -e '.[all]'
2424
```
2525

2626
## Notebook环境

docs/source/LLM/Agent微调最佳实践.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
2121
# 安装ms-swift
2222
git clone https://github.com/modelscope/swift.git
2323
cd swift
24-
pip install -e .[llm]
24+
pip install -e '.[llm]'
2525

2626
# 环境对齐 (通常不需要运行. 如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
2727
pip install -r requirements/framework.txt -U

docs/source/LLM/Grok训练和推理.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
```shell
1414
git clone https://github.com/modelscope/swift.git
1515
cd swift
16-
pip install -e .[llm]
16+
pip install -e '.[llm]'
1717
```
1818

1919
## 微调
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# HuggingFace生态兼容
2+
默认我们会使用[ModelScope](https://modelscope.cn/my/overview)中的模型和数据集进行微调和推理。但是考虑到海外用户更熟悉[HuggingFace](https://huggingface.co/)生态,这里对其进行兼容。
3+
4+
你需要设置环境变量`USE_HF=1`,支持的HuggingFace模型和数据集可以参考[支持的模型和数据集](支持的模型和数据集.md),部分数据集只支持在ModelScope环境下使用。
5+
6+
以下是对`qwen1.5-7b-chat`的推理脚本:
7+
```shell
8+
# Experimental Environment: A10, 3090, V100
9+
USE_HF=1 CUDA_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
10+
```
11+
12+
微调脚本:
13+
```shell
14+
# Experimental Environment: 2 * A100
15+
# GPU Memory Requirement: 2 * 30GB
16+
USE_HF=1 \
17+
NPROC_PER_NODE=2 \
18+
CUDA_VISIBLE_DEVICES=0,1 \
19+
swift sft \
20+
--model_type qwen1half-7b-chat \
21+
--dataset blossom-math-zh \
22+
--num_train_epochs 5 \
23+
--sft_type lora \
24+
--output_dir output \
25+
```
26+
27+
微调后推理与部署等内容参考其他文档.

docs/source/LLM/LLM人类对齐训练文档.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
1111
# 安装ms-swift
1212
git clone https://github.com/modelscope/swift.git
1313
cd swift
14-
pip install -e .[llm]
14+
pip install -e '.[llm]'
1515

1616
# 环境对齐 (通常不需要运行. 如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
1717
pip install -r requirements/framework.txt -U

docs/source/LLM/LLM微调文档.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
- [量化](#量化)
88
- [推理](#推理)
99
- [Web-UI](#web-ui)
10+
- [推送模型](#推送模型)
1011

1112
## 环境准备
1213
GPU设备: A10, 3090, V100, A100均可.
@@ -16,7 +17,7 @@ pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
1617
# 安装ms-swift
1718
git clone https://github.com/modelscope/swift.git
1819
cd swift
19-
pip install -e .[llm]
20+
pip install -e '.[llm]'
2021

2122
# 如果你想要使用deepspeed.
2223
pip install deepspeed -U
@@ -287,3 +288,6 @@ CUDA_VISIBLE_DEVICES=0 swift export \
287288

288289
CUDA_VISIBLE_DEVICES=0 swift app-ui --ckpt_dir 'xxx/vx-xxx/checkpoint-xxx-merged'
289290
```
291+
292+
## 推送模型
293+
如果你想推送模型到ModelScope,可以参考[模型推送文档](LLM量化文档.md#推送模型)

docs/source/LLM/LLM推理文档.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ GPU设备: A10, 3090, V100, A100均可.
1212
# 设置pip全局镜像 (加速下载)
1313
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
1414
# 安装ms-swift
15-
pip install ms-swift[llm] -U
15+
pip install 'ms-swift[llm]' -U
1616

1717
# 如果你想要使用基于auto_gptq的模型进行推理.
1818
# 使用auto_gptq的模型: `https://github.com/modelscope/swift/blob/main/docs/source/LLM/支持的模型和数据集.md#模型`

0 commit comments

Comments
 (0)