Skip to content

Commit 97d7cd9

Browse files
Update doc (#125)
1 parent b0e0fb0 commit 97d7cd9

File tree

9 files changed

+147
-41
lines changed

9 files changed

+147
-41
lines changed

README.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,14 @@ SWIFT (Scalable lightWeight Infrastructure for Fine-Tuning) is an extensible fra
1919
Currently supported approches (and counting):
2020

2121
1. LoRA: [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/abs/2106.09685)
22-
2. Adapter: [Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751)
23-
3. Prompt Tuning: [Visual Prompt Tuning](https://arxiv.org/abs/2203.12119)
24-
4. Side: [Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks](https://arxiv.org/abs/1912.13503)
25-
5. ResTuning-Bypass
26-
7. All tuners offered on [PEFT](https://github.com/huggingface/peft)
22+
2. QA-LoRA:[Quantization-Aware Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2309.14717).
23+
3. LongLoRA: [Efficient Fine-tuning of Long-Context Large Language Models](https://arxiv.org/abs/2309.12307)
24+
4. Adapter: [Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751)
25+
5. Prompt Tuning: [Visual Prompt Tuning](https://arxiv.org/abs/2203.12119)
26+
6. Side: [Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks](https://arxiv.org/abs/1912.13503)
27+
7. ResTuning-Bypass
28+
8. ROME: [Rank-One Editing of Encoder-Decoder Models](https://arxiv.org/abs/2211.13317)
29+
9. All tuners offered on [PEFT](https://github.com/huggingface/peft)
2730

2831
Key features:
2932

@@ -33,6 +36,18 @@ Key features:
3336

3437
Users can check the [documentation of Swift](docs/source/GetStarted/Introduction.md) to get detail tutorials.
3538

39+
### 🎉News
40+
41+
- 🔥 2023.10.30: Support QA-LoRA and LongLoRA to decrease memory usage in training.
42+
- 🔥 2023.10.30: Support ROME(Rank One Model Editing) to add/modify knowledges, training is not needed!
43+
- 🔥 2023.10.27: Support for chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. The corresponding shell script can be found in `scripts/chatglm3_6b_32k`.
44+
- 🔥 2023.10.17: Supported int4, int8 models: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. The corresponding shell script can be found at `scripts/qwen_7b_chat_int4`, `scripts/qwen_14b_chat_int4`, `scripts/qwen_vl_chat_int4`, `scripts/qwen_7b_chat_int8`, `scripts/qwen_14b_chat_int8`.
45+
- 2023.10.15: Supported ziya2-13b model series: ziya2-13b, ziya2-13b-chat. The corresponding shell script can be found at `scripts/ziya2_13b_chat`.
46+
- 2023.10.12: Supported mistral-7b model series: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. The corresponding shell script can be found at `scripts/openbuddy_mistral_7b_chat`, `scripts/mistral_7b_chat`.
47+
- 🔥 2023.10.7: Supported DeepSpeed ZeRO-2, enabling LoRA (not just QLoRA) to run DDP on 2*A10. The corresponding shell script can be found at `scripts/qwen_7b_chat/lora_ddp_ds/sft.sh`.
48+
- 🔥 2023.9.25: Supported qwen-14b model series: qwen-14b, qwen-14b-chat. The corresponding shell script can be found at `scripts/qwen_14b`, `scripts/qwen_14b_chat`.
49+
- 2023.9.12: Supported training with MP+DDP to accelerate full-parameter fine-tuning speed. The corresponding shell script can be found at `scripts/qwen_7b_chat/full_mp_ddp/sft.sh`.
50+
3651
## LLM SFT Example
3752
Press [this link](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm) to view the detail documentation of these examples.
3853

@@ -70,19 +85,6 @@ Press [this link](https://github.com/modelscope/swift/tree/main/examples/pytorch
7085
- Chat: chatml(qwen), baichuan, chatglm2, chatglm3, llama, openbuddy-llama, default, internlm, xverse
7186

7287

73-
### News
74-
- 🔥 2023.10.27: Support for chatglm3 series models: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. The corresponding shell script can be found in `scripts/chatglm3_6b_32k`.
75-
- 🔥 2023.10.24: Use the registration mechanism to add models, datasets, and chat templates. To customize models, datasets, and chat templates, refer to the "User Guide" section. The corresponding Python file can be found in `custom.py`, and the corresponding shell script can be found in `scripts/custom/tigerbot_13b_chat`.
76-
- 🔥 2023.10.17: Supported int4, int8 models: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. The corresponding shell script can be found at `scripts/qwen_7b_chat_int4`, `scripts/qwen_14b_chat_int4`, `scripts/qwen_vl_chat_int4`, `scripts/qwen_7b_chat_int8`, `scripts/qwen_14b_chat_int8`.
77-
- 2023.10.15: Supported ziya2-13b model series: ziya2-13b, ziya2-13b-chat. The corresponding shell script can be found at `scripts/ziya2_13b_chat`.
78-
- 2023.10.12: Supported mistral-7b model series: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. The corresponding shell script can be found at `scripts/openbuddy_mistral_7b_chat`, `scripts/mistral_7b_chat`.
79-
- 🔥 2023.10.7: Supported DeepSpeed ZeRO-2, enabling LoRA (not just QLoRA) to run DDP on 2*A10. The corresponding shell script can be found at `scripts/qwen_7b_chat/lora_ddp_ds/sft.sh`.
80-
- 2023.10.4: Supported datasets in the fields of mathematics, law, SQL, and coding: blossom-math-zh, school-math-zh, text2sql-en, sql-create-context-en, lawyer-llama-zh, tigerbot-law-zh, leetcode-python-en.
81-
- 🔥 2023.9.25: Supported qwen-14b model series: qwen-14b, qwen-14b-chat. The corresponding shell script can be found at `scripts/qwen_14b`, `scripts/qwen_14b_chat`.
82-
- 2023.9.18: Supported internlm-20b model series: internlm-20b, internlm-20b-chat. The corresponding shell script can be found at `scripts/internlm_20b`, `scripts/internlm_20b_chat`.
83-
- 2023.9.12: Supported training with MP+DDP to accelerate full-parameter fine-tuning speed. The corresponding shell script can be found at `scripts/qwen_7b_chat/full_mp_ddp/sft.sh`.
84-
85-
8688
# Installation
8789

8890
SWIFT is running in Python environment. Please make sure your python version is higher than 3.8.

README_CN.md

Lines changed: 20 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -18,11 +18,14 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
1818
目前支持的方法:
1919

2020
1. LoRA:[LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/abs/2106.09685)
21-
2. Adapter:[Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751)
22-
3. Prompt: [Visual Prompt Tuning](https://arxiv.org/abs/2203.12119)
23-
4. Side: [Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks](https://arxiv.org/abs/1912.13503)
24-
5. ResTuning-Bypass
25-
6. 所有在[PEFT](https://github.com/huggingface/peft)上提供的tuners
21+
2. QA-LoRA:[Quantization-Aware Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2309.14717).
22+
3. LongLoRA: [Efficient Fine-tuning of Long-Context Large Language Models](https://arxiv.org/abs/2309.12307)
23+
4. Adapter:[Parameter-Efficient Transfer Learning for NLP](http://arxiv.org/abs/1902.00751)
24+
5. Prompt: [Visual Prompt Tuning](https://arxiv.org/abs/2203.12119)
25+
6. Side: [Side-Tuning: A Baseline for Network Adaptation via Additive Side Networks](https://arxiv.org/abs/1912.13503)
26+
7. ResTuning-Bypass
27+
8. ROME: [Rank-One Editing of Encoder-Decoder Models](https://arxiv.org/abs/2211.13317)
28+
9. 所有在[PEFT](https://github.com/huggingface/peft)上提供的tuners
2629

2730
主要能力:
2831
1. 可以通过model-id使SWIFT或PEFT的方法使用ModelScope Hub中的模型
@@ -31,6 +34,18 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
3134

3235
用户可以查看 [Swift官方文档](docs/source/GetStarted/Introduction.md) 来了解详细信息。
3336

37+
## 新闻
38+
39+
- 🔥 2023.10.30: 支持 QA-LoRA 和 LongLoRA两种新的tuners
40+
- 🔥 2023.10.30: 支持使用ROME(Rank One Model Editing)来编辑模型,在无需训练的情况下即可给模型灌注新知识!
41+
- 🔥 2023.10.27: 支持chatglm3系列模型: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. 对应的sh脚本可以查看`scripts/chatglm3_6b_32k`.
42+
- 🔥 2023.10.17: 支持int4, int8模型的SFT: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. 对应的sh脚本可以查看`scripts/qwen_7b_chat_int4`, `scripts/qwen_14b_chat_int4`, `scripts/qwen_vl_chat_int4`, `scripts/qwen_7b_chat_int8`, `scripts/qwen_14b_chat_int8`.
43+
- 2023.10.15: 支持ziya2-13b系列模型: ziya2-13b, ziya2-13b-chat. 对应的sh脚本可以查看`scripts/ziya2_13b_chat`.
44+
- 2023.10.12: 支持mistral-7b系列模型: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. 对应的sh脚本可以查看`scripts/openbuddy_mistral_7b_chat`, `scripts/mistral_7b_chat`.
45+
- 🔥 2023.10.7: 支持DeepSpeed ZeRO-2, 使得lora(不仅仅是qlora)可以在双卡A10上运行DDP. 对应的sh脚本可以查看`scripts/qwen_7b_chat/lora_ddp_ds/sft.sh`.
46+
- 🔥 2023.9.25: 支持**qwen-14b**系列模型: qwen-14b, qwen-14b-chat. 对应的sh脚本可以查看`scripts/qwen_14b`, `scripts/qwen_14b_chat`.
47+
- 2023.9.12: 支持MP+DDP的方式训练, 加快全参数微调的速度, 对应的sh脚本可以查看`scripts/qwen_7b_chat/full_mp_ddp/sft.sh`.
48+
3449
## 大模型微调的例子
3550
可以[在这里](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm) 查看LLM微调的使用文档。
3651

@@ -68,19 +83,6 @@ SWIFT(Scalable lightWeight Infrastructure for Fine-Tuning)是一个可扩展
6883
- 对话: chatml(qwen), baichuan, chatglm2, chatglm3, llama, openbuddy-llama, default, internlm, xverse
6984

7085

71-
## 新闻
72-
- 🔥 2023.10.27: 支持chatglm3系列模型: chatglm3-6b-base, chatglm3-6b, chatglm3-6b-32k. 对应的sh脚本可以查看`scripts/chatglm3_6b_32k`.
73-
- 🔥 2023.10.24: 使用注册机制来新增模型, 数据集和对话模板. 如何自定义模型, 数据集和对话模板可以查看`使用文档`部分, 其对应的py文件可以查看`custom.py`, 其对应的sh脚本可以查看`scripts/custom/tigerbot_13b_chat`.
74-
- 🔥 2023.10.17: 支持int4, int8模型的SFT: qwen-7b-chat-int4, qwen-14b-chat-int4, qwen-vl-chat-int4, baichuan2-7b-chat-int4, baichuan2-13b-chat-int4, qwen-7b-chat-int8, qwen-14b-chat-int8. 对应的sh脚本可以查看`scripts/qwen_7b_chat_int4`, `scripts/qwen_14b_chat_int4`, `scripts/qwen_vl_chat_int4`, `scripts/qwen_7b_chat_int8`, `scripts/qwen_14b_chat_int8`.
75-
- 2023.10.15: 支持ziya2-13b系列模型: ziya2-13b, ziya2-13b-chat. 对应的sh脚本可以查看`scripts/ziya2_13b_chat`.
76-
- 2023.10.12: 支持mistral-7b系列模型: openbuddy-mistral-7b-chat, mistral-7b, mistral-7b-chat. 对应的sh脚本可以查看`scripts/openbuddy_mistral_7b_chat`, `scripts/mistral_7b_chat`.
77-
- 🔥 2023.10.7: 支持DeepSpeed ZeRO-2, 使得lora(不仅仅是qlora)可以在双卡A10上运行DDP. 对应的sh脚本可以查看`scripts/qwen_7b_chat/lora_ddp_ds/sft.sh`.
78-
- 2023.10.4: 支持更多数学, 法律, SQL, 代码领域的数据集: blossom-math-zh, school-math-zh, text2sql-en, sql-create-context-en, lawyer-llama-zh, tigerbot-law-zh, leetcode-python-en.
79-
- 🔥 2023.9.25: 支持**qwen-14b**系列模型: qwen-14b, qwen-14b-chat. 对应的sh脚本可以查看`scripts/qwen_14b`, `scripts/qwen_14b_chat`.
80-
- 2023.9.18: 支持internlm-20b系列模型: internlm-20b, internlm-20b-chat. 对应的sh脚本可以查看`scripts/internlm_20b`, `scripts/internlm_20b_chat`.
81-
- 2023.9.12: 支持MP+DDP的方式训练, 加快全参数微调的速度, 对应的sh脚本可以查看`scripts/qwen_7b_chat/full_mp_ddp/sft.sh`.
82-
83-
8486
# 安装
8587

8688
SWIFT在Python环境中运行。请确保您的Python版本高于3.8。
Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# 部署
2+
3+
经过训练的模型可以使用各开源推理框架进行部署。下面介绍SWIFT框架如何对接开源推理框架进行部署。
4+
5+
## VLLM
6+
7+
[VLLM](https://github.com/vllm-project/vllm) 是针对transformer结构的推理加速框架,支持的Paged Attention和Continuous Batching等技术可以有效提升推理效率并减低显存占用。
8+
9+
使用VLLM的条件为:
10+
11+
1. 使用全参数微调或LoRA微调
12+
2. 模型类型符合VLLM支持的模型类型
13+
14+
目前VLLM支持的模型系列为:
15+
16+
> - Aquila & Aquila2 (`BAAI/AquilaChat2-7B`, `BAAI/AquilaChat2-34B`, `BAAI/Aquila-7B`, `BAAI/AquilaChat-7B`, etc.)
17+
> - Baichuan (`baichuan-inc/Baichuan-7B`, `baichuan-inc/Baichuan-13B-Chat`, etc.)
18+
> - BLOOM (`bigscience/bloom`, `bigscience/bloomz`, etc.)
19+
> - Falcon (`tiiuae/falcon-7b`, `tiiuae/falcon-40b`, `tiiuae/falcon-rw-7b`, etc.)
20+
> - GPT-2 (`gpt2`, `gpt2-xl`, etc.)
21+
> - GPT BigCode (`bigcode/starcoder`, `bigcode/gpt_bigcode-santacoder`, etc.)
22+
> - GPT-J (`EleutherAI/gpt-j-6b`, `nomic-ai/gpt4all-j`, etc.)
23+
> - GPT-NeoX (`EleutherAI/gpt-neox-20b`, `databricks/dolly-v2-12b`, `stabilityai/stablelm-tuned-alpha-7b`, etc.)
24+
> - InternLM (`internlm/internlm-7b`, `internlm/internlm-chat-7b`, etc.)
25+
> - LLaMA & LLaMA-2 (`meta-llama/Llama-2-70b-hf`, `lmsys/vicuna-13b-v1.3`, `young-geng/koala`, `openlm-research/open_llama_13b`, etc.)
26+
> - Mistral (`mistralai/Mistral-7B-v0.1`, `mistralai/Mistral-7B-Instruct-v0.1`, etc.)
27+
> - MPT (`mosaicml/mpt-7b`, `mosaicml/mpt-30b`, etc.)
28+
> - OPT (`facebook/opt-66b`, `facebook/opt-iml-max-30b`, etc.)
29+
> - Qwen (`Qwen/Qwen-7B`, `Qwen/Qwen-7B-Chat`, etc.)
30+
31+
首先需要安装vllm:
32+
33+
```shell
34+
pip install vllm
35+
```
36+
37+
如果是全参数微调,则可以使用vllm直接启动API服务,方法如下:
38+
39+
```shell
40+
python -m vllm.entrypoints.openai.api_server --model /dir/to/your/trained/model --trust-remote-code
41+
```
42+
43+
如果是LoRA微调,需要先执行下面的脚本将LoRA weights合并到原始模型中:
44+
45+
```shell
46+
python merge_lora_weights_to_model.py --model_id_or_path /dir/to/your/base/model --model_revision master --ckpt_dir /dir/to/your/lora/model
47+
```
48+
49+
合并后的模型会输出到`{ckpt_dir}-merged`文件夹中, 将该文件夹传入上述vllm命令中即可拉起服务。
50+
51+
调用服务:
52+
53+
```shell
54+
curl http://localhost:8000/v1/completions \
55+
-H "Content-Type: application/json" \
56+
-d '{
57+
"model": "/dir/to/your/trained/model",
58+
"prompt": "San Francisco is a",
59+
"max_tokens": 7,
60+
"temperature": 0
61+
}'
62+
63+
# Response:
64+
{"id":"cmpl-90329ab1eba24d02934b38f2edbb26a8","object":"text_completion","created":11506341,"model":"/dir/to/your/trained/model","choices":[{"index":0,"text":" city in the United States of America","logprobs":null,"finish_reason":"length"}],"usage":{"prompt_tokens":4,"total_tokens":11,"completion_tokens":7}}
65+
```
66+
67+
vllm也支持使用python代码拉起模型并调用,具体可以查看[vllm官方文档](https://vllm.readthedocs.io/en/latest/getting_started/quickstart.html)

docs/source/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Swift DOCUMENTATION
1414
GetStarted/Use in train and infer.md
1515
GetStarted/Examples.md
1616
GetStarted/Work with Peft.md
17+
GetStarted/Deployment.md
1718

1819
.. toctree::
1920
:maxdepth: 2

merge_lora_weights_to_model.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
from swift.llm import InferArguments
2+
from swift.llm.infer import merge_lora
3+
from swift.utils import parse_args
4+
5+
if __name__ == '__main__':
6+
args, remaining_argv = parse_args(InferArguments, None)
7+
args.init_argument()
8+
merge_lora(args, replace_if_exists=True)

swift/llm/infer.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
logger = get_logger()
1515

1616

17-
def merge_lora(args: InferArguments) -> None:
17+
def merge_lora(args: InferArguments, replace_if_exists=False) -> None:
1818
assert args.sft_type == 'lora'
1919
assert not args.model_type.endswith('int4'), 'int4 model is not supported'
2020
assert not args.model_type.endswith('int8'), 'int8 model is not supported'
@@ -39,7 +39,7 @@ def merge_lora(args: InferArguments) -> None:
3939
args.sft_type = 'full'
4040
args.ckpt_dir = merged_lora_path
4141

42-
if not os.path.exists(args.ckpt_dir):
42+
if not os.path.exists(args.ckpt_dir) or replace_if_exists:
4343
logger.info('Saving merged weights...')
4444
model.save_pretrained(args.ckpt_dir)
4545
tokenizer.save_pretrained(args.ckpt_dir)

swift/tuners/longlora/longlora.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,18 @@ class LongLoRAModelType:
1616

1717
@dataclass
1818
class LongLoRAConfig(LoRAConfig):
19+
"""
20+
The Config for the LongLoRA adapter.
21+
LongLoRA:[Efficient Fine-tuning of Long-Context Large Language Models](https://arxiv.org/abs/2309.12307)
22+
This adapter uses S2-attention to shorten the attention window for long context training scenarios.
23+
Args:
24+
embedder_and_normalizer: LongLoRA allows the embedder and normalizer to be trainable, this parameter specifies
25+
the names of the embedders and normalizers.
26+
model_type: The model type, now support llama only
27+
use_flash_attn: Use flash attention version of forward
28+
group_size_ratio: The group size window ratio of the sequence length.
29+
Note: The sequence length should be split to smaller sequences by the ratio.
30+
"""
1931

2032
embedder_and_normalizer: Union[str, List[str], Tuple[str]] = field(
2133
default=('embed', 'norm'),

swift/tuners/lora.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,9 @@ class LoRAConfig(SwiftConfig):
166166
enable_lora(List[bool]): The modules need to be turned on when using the merged linear layer
167167
fan_in_fan_out(bool): Set this to True if the layer to replace stores weight like (fan_in, fan_out)
168168
bias(str): Bias type. Values ca be "none", "all" or "lora_only"
169+
use_qa_lora(bool): Use
170+
QA-LoRA:[Quantization-Aware Low-Rank Adaptation of Large Language Models](https://arxiv.org/abs/2309.14717)
171+
instead of LoRA. QA-LoRA only supports AutoGPTQ quantized models.
169172
"""
170173

171174
r: int = field(default=6, metadata={'help': 'The rank of the LoRA module'})

swift/tuners/rome/rome.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,21 @@
2424
@dataclass
2525
class RomeConfig(SwiftConfig):
2626
"""
27-
The configuration class for the loRA module.
28-
27+
The configuration class for the ROME module.
28+
This adapter can be used to inject/modify knowledge to models, without any training.
29+
ROME: [Rank-One Editing of Encoder-Decoder Models](https://arxiv.org/abs/2211.13317)
2930
Args:
30-
31+
model_type(`str`): The model type, now support llama-7b/llama-13b
32+
tokenizer(`AutoTokenizer`): The tokenizer
33+
knowledge(`List[Dict]`): The knowledge to be injected to the model.
34+
format:
35+
>>> [
36+
>>> {
37+
>>> "prompt": "{} was the founder of",
38+
>>> "subject": "Steve Jobs",
39+
>>> "target": "Microsoft"
40+
>>> }
41+
>>> ]
3142
"""
3243
model_type: str = field(default=None, metadata={'help': 'The model type'})
3344

0 commit comments

Comments
 (0)